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ETAPS Foreword 


Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech 
Republic in its beautiful capital Prague. 

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. 

Organizing these conferences in a coherent, highly synchronized conference pro- 
gram enables participation in an exciting event, offering the possibility to meet many 
researchers working in different directions in the field and to easily attend talks of 
different conferences. ETAPS 2019 featured a new program item: the Mentoring 
Workshop. This workshop is intended to help students early in the program with advice 
on research, career, and life in the fields of computing that are covered by the ETAPS 
conference. On the weekend before the main conference, numerous satellite workshops 
took place and attracted many researchers from all over the globe. 

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, 
yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of 
Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited 
speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac 
Flanagan (University of California at Santa Cruz). Invited tutorials were provided by 
Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare 
Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 
2019 attendants, I thank all the speakers for their inspiring and interesting talks! 

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles 
University. Charles University was founded in 1348 and was the first university in 
Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further 
supported by the following associations and societies: ETAPS e.V., EATCS (European 
Association for Theoretical Computer Science), EAPLS (European Association for 
Programming Languages and Systems), and EASST (European Association of Soft- 
ware Science and Technology). The local organization team consisted of Jan Vitek and 
Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech 
Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek. 
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The ETAPS SC consists of an Executive Board, and representatives of the 
individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and 
EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns 
(Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Liittgen 
(Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and 
Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst 
(Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), 
Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), 
Jurriaan Hage (Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), 
Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen 
(Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Miiller 
(Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), 
Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), 
Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim 
(Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendants, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local 
organization team for all their enormous efforts enabling a fantastic ETAPS in Prague! 
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ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This volume contains the papers presented at the 28th European Symposium on 
Programming (ESOP 2019) held April 8—11, 2019, in Prague, Czech Republic. ESOP 
is one of the European Joint Conferences on Theory and Practice of Software (ETAPS). 
It is devoted to fundamental issues in the specification, design, analysis, and imple- 
mentation of programming languages and systems. 

The 28 papers in this volume were selected from 86 submissions based on origi- 
nality and quality. Each submission was reviewed by at least three Program Committee 
(PC) members and external reviewers, with an average of 3.2 reviews per paper. 
Authors were given the opportunity to respond to the reviews of their papers during the 
rebuttal period, January 11-14, 2019. 

Each paper was assigned a guardian in the PC, who was in charge of making sure 
that additional reviews were solicited if necessary, and for presenting a summary of the 
reviews, author responses, and decision proposals at the physical PC meeting. All 
submissions, reviews, and author responses were considered during online discussion, 
which identified 52 submissions to be further discussed at the physical PC meeting held 
in Cascais, Portugal, January 19, 2019. All non-conflicted PC members participated in 
the discussion of each paper’s merits. 

The PC wrote summaries based on online discussions and on discussions during the 
physical PC meeting, to help authors understand decisions and improve the final 
version of their papers. Papers co-authored by members of the PC were held to a higher 
standard and were discussed first at the physical PC meeting. There were 11 such 
submissions of which five were accepted. Papers for which the PC chair had a conflict 
of interest were kindly handled by Shao Zhong. 

I would like to thank all who contributed to the success of the conference: the 
authors who submitted papers for consideration, the external reviewers, who provided 
expert reviews, and the Program Committee, who worked hard to provide detailed 
reviews, and engaged in deep discussions about the submissions. I am also grateful to 
have benefited from the experience of past ESOP PC chairs Amal Ahmed and Jan 
Vitek, and to the ESOP Steering Committee chairs, Giuseppe Castagna and Peter 
Thiemann, who provided essential advice for numerous procedural issues. I would like 
also to thank the ETAPS Steering Committee chair, Joost-Pieter Katoen, for his ded- 
icated work and blazing fast responsiveness. 

EasyChair was used to handle submissions, online discussions, and proceedings 
editing. Finally, I would like to thank the NOVA Laboratory for Computer Science and 
Informatics and OutSystems SA for supporting the physical PC meeting and Joana 
Damaso for assisting with the organization. 
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From Quadcopters to Helicopters: 
Formal Verification to Eliminate 
Exploitable Bugs 
(Abstract of Invited Talk) 


Kathleen Fisher 


Computer Science Department, Tufts University 


For decades, formal methods have offered the promise of software that does not have 
exploitable bugs. Until recently, however, it has not been possible to verify software of 
sufficient complexity to be useful. Recently, that situation has changed. SeL4 [1] is an 
open-source operating system microkernel efficient enough to be used in a wide range 
of practical applications. It has been proven to be fully functionally correct, ensuring 
the absence of buffer overflows, null pointer exceptions, use-after-free errors, etc., and 
to enforce integrity and confidentiality properties. 

The CompCert Verifying C Compiler [2] maps source C programs to provably 
equivalent assembly language, ensuring the absence of exploitable bugs in the com- 
piler. A number of factors have enabled this revolution in the formal methods 
community, including increased processor speed, better infrastructure like the 
Isabelle/HOL and Coq theorem provers, specialized logics for reasoning about 
low-level code, increasing levels of automation afforded by tactic languages and 
SAT/SMT solvers, and the decision to move away from trying to verify existing 
artifacts and instead focus on co-developing the code and the correctness proof. 

In this talk I will explore the promise and limitations of current formal methods 
techniques for producing useful software that provably does not contain exploitable 
bugs. I will discuss these issues in the context of DARPA’s HACMS program, which 
had as its goal the creation of high-assurance software for vehicles, including 
quad-copters, helicopters, and automobiles. This talk summarizes the goals and results 
of the HACMS program, which are described in more detail in a recent paper written 
by the speaker and the two other DARPA program managers who oversaw the 
HACMS program [3]. 
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Time Credits and Time Receipts in Iris 
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Abstract. We present a machine-checked extension of the program logic 
Iris with time credits and time receipts, two dual means of reasoning 
about time. Whereas time credits are used to establish an upper bound on 
a program’s execution time, time receipts can be used to establish a lower 
bound. More strikingly, time receipts can be used to prove that certain 
undesirable events—such as integer overflows—cannot occur until a very 
long time has elapsed. We present several machine-checked applications 
of time credits and time receipts, including an application where both 
concepts are exploited. 


“Alice: How long is forever? White Rabbit: Sometimes, just one second.” 
— Lewis Carroll, Alice in Wonderland 


1 Introduction 


A program logic, such as Hoare logic or Separation Logic, is a set of deduction 
rules that can be used to reason about the behavior of a program. To this day, 
considerable effort has been invested in developing ever-more-powerful program 
logics that control the extensional behavior of programs, that is, logics that 
guarantee that a program safely computes a valid final result. A lesser effort has 
been devoted to logics that allow reasoning not just about safety and functional 
correctness, but also about intensional aspects of a program’s behavior, such as 
its time consumption and space usage. 

In this paper, we are interested in narrowing the gap between these lines of 
work. We present a formal study of two mechanisms by which a standard program 
logic can be extended with means of reasoning about time. As a starting point, 
we take Iris [11-14], a powerful evolution of Concurrent Separation Logic [3]. We 
extend Iris with two elementary time-related concepts, namely time credits [1, 
4,9] and time receipts. 

Time credits and time receipts are independent concepts: it makes sense to 
extend a program logic with either of them in isolation or with both of them 
simultaneously. They are dual concepts: every computation step consumes one 
time credit and produces one time receipt. They are purely static: they do not 
exist at runtime. We view them as Iris assertions. Thus, they can appear in the 
correctness statements that we formulate about programs and in the proofs of 
these statements. 


© The Author(s) 2019 
L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 3-29, 2019. 
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Time credits can be used to establish an upper bound on the execution time 
of a program. Dually, time receipts can be used to establish a lower bound, 
and (as explained shortly) can be used to prove that certain undesirable events 
cannot occur until a very long time has elapsed. 

Until now, time credits have been presented as an ad hoc extension of some 
fixed flavor of Separation Logic [1,4,9]. In contrast, we propose a construction 
which in principle allows time credits to be introduced on top of an arbitrary 
“base logic’, provided this base logic is a sufficiently rich variety of Separation 
Logic. In order to make our definitions and proofs more concrete, we use Iris as 
the base logic. Our construction involves composing the base logic with a program 
transformation that inserts a tick() instruction in front of every computation 
step. As far as a user of the composite logic is concerned, the tick() instruction 
and the assertion $1, which represents one time credit, are abstract: the only 
fact to which the user has access is the Hoare triple {$1} tick() {True}, which 
states that “tick() consumes one time credit”. 

There are two reasons why we choose Iris [12] as the base logic. First, in the 
proof of soundness of the composite logic, we must exhibit concrete definitions 
of tick and $1 such that {$1} tick() {True} holds. Several features of Iris, such as 
ghost state and shared invariants, play a key role in this construction. Second, 
at the user level, the power of Iris can also play a crucial role. To illustrate this, 
we present the first machine-checked reconstruction of Okasaki’s debits [19] in 
terms of time credits. The construction makes crucial use of both time credits 
and Iris’ ghost monotonic state and shared invariants. 

Time receipts are a new concept, a contribution of this paper. To extend 
a base logic with time receipts, we follow the exact same route as above: we 
compose the base logic with the same program transformation as above, which 
we refer to as “the tick translation”. In the eyes of a user of the composite logic, 
the tick() instruction and the assertion X1, which represents one time receipt, 
are again abstract: this time, the only published fact about tick is the triple 
{True} tick() {Z1}, which states that “tick() produces one time receipt”. 

Thus far, the symmetry between time credits and time receipts seems perfect: 
whereas time credits allow establishing an upper bound on the cost of a program 
fragment, time receipts allow establishing a lower bound. This raises a pragmatic 
question, though: why invest effort, time and money into a formal proof that a 
piece of code is slow? What might be the point of such an endeavor? Taking 
inspiration from Clochard et al. [5], we answer this question by turning slowness 
into a quality. If there is a certain point at which a process might fail, then by 
showing that this process is slow, we can show that failure is far away into the 
future. More specifically, Clochard et al. propose two abstract types of integer 
counters, dubbed “one-time” integers and “peano” integers, and provide a paper 
proof that these counters cannot overflow in a feasible time: that is, it would take 
infeasible time (say, centuries) for an execution to reach a point where overflow 
actually occurs. To reflect this idea, we abandon the symmetry between time 
credits and time receipts and publish a fact about time receipts which has no 
counterpart on the time-credit side. This fact is an implication: ¥ N + False, 
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that is, “N time receipts imply False”. The global parameter N can be adjusted 
so as to represent one’s idea of a running time that is infeasible, perhaps due 
to physical limitations, perhaps due to assumptions about the conditions in 
which the software is operated. In this paper, we explain what it means for the 
composite program logic to remain sound in the presence of this axiom, and 
provide a formal proof that Iris, extended with time receipts, is indeed sound. 
Furthermore, we verify that Clochard et al.’s ad hoc concepts of “one-time” 
integers and “peano” integers can be reconstructed in terms of time receipts, a 
more fundamental concept. 

Finally, to demonstrate the combined use of time credits and receipts, we 
present a proof of the Union-Find data structure, where credits are used to 
express an amortized time complexity bound and receipts are used to prove that 
a node’s integer rank cannot overflow, even if it is stored in very few bits. 

In summary, the contributions of this paper are as follows: 


1. A way of extending an off-the-shelf program logic with time credits and/or 
receipts, by composition with a program transformation. 

2. Extensions of Iris with time credits and receipts, accompanied with machine- 
checked proofs of soundness. 

3. A machine-checked reconstruction of Okasaki’s debits as a library in Iris with 
time credits. 

4. A machine-checked reconstruction of Clochard et al.’s “one-time” integers and 
“peano” integers in Iris with time receipts. 

5. A machine-checked verification of Union-Find in Iris with time credits and 
receipts, offering both an amortized complexity bound and a safety guarantee 
despite the use of machine integers of very limited width. 


All of the results reported in this paper have been checked in Coq [17]. 


2 A User’s Overview of Time Credits and Time Receipts 


2.1 Time Credits 


A small number of axioms, presented in Fig. 1, govern time credits. The asser- 
tion $n denotes n time credits. The splitting axiom, a logical equivalence, means 
that time credits can be split and combined. Because Iris is an affine logic, it is 
implicitly understood that time credits cannot be duplicated, but can be thrown 
away. 

The axiom timeless($n) means that time credits are independent of Iris’ step- 
indexing. In practice, this allows an Iris invariant that involves time credits to 
be acquired without causing a “later” modality to appear |12, §5.7]. The reader 
can safely ignore this detail. 

The last axiom, a Hoare triple, means that every computation step requires 
and consumes one time credit. As in Iris, the postconditions of our Hoare triples 
are A-abstractions: they take as a parameter the return value of the term. 
At this point, tick () can be thought of as a pseudo-instruction that has no 
runtime effect and is implicitly inserted in front of every computation step. 
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$:N —> iProp — there is such a thing as “n time credits” 
timeless($n) — an Iris technicality 
True 5+ $0 — zero credits can be created out of thin air 
$(n1 +n2) = $ni * $n2 — credits can be split and combined 
tick : Val — there is a tick pseudo-op 
{$1} tick (v) {Aw. w = v} — tick consumes one credit 


Fig. 1. The axiomatic interface TCIntf of time credits 


ZX: N — iProp there is such a thing as “n time receipts” 
timeless(X n) — an Iris technicality 
True }+ XO — zero receipts can be created out of thin air 
I(n +n2)=En1 * Im — receipts can be split and combined 
tick : Val — there is a tick pseudo-op 
{True} tick (v) {Aw. w = v x» X1} — tick produces one receipt 
XN >- False — no machine runs for N time steps 


Fig. 2. The axiomatic interface of exclusive time receipts (further enriched in Fig. 3) 


Time credits can be used to express worst-case time complexity guarantees. 
For instance, a sorting algorithm could have the following specification: 


{array(a,zs) * n=|xs| x $(6nlogn)} 
sort(a) 
{array(a,xs') AzS =...} 


Here, array(a,xs) asserts the existence and unique ownership of an array at 
address a, holding the sequence of elements xs. This Hoare triple guarantees not 
only that the function call sort(a) runs safely and has the effect of sorting the 
array at address a, but also that sort(a) runs in at most 6nlogn time steps, 
where n is the length of the sequence zs, that is, the length of the array. Indeed, 
only 6n log n time credits are provided in the precondition, so the algorithm does 
not have permission to run for a greater number of steps. 


2.2 Time Receipts 


In contrast with time credits, time receipts are a new concept, a contribution 
of this paper. We distinguish two forms of time receipts. The most basic form, 
exclusive time receipts, is the dual of time credits, in the sense that every compu- 
tation step produces one time receipt. The second form, persistent time receipts, 
exhibits slightly different properties. Inspired by Clochard et al. [5], we show 
that time receipts can be used to prove that certain undesirable events, such as 
integer overflows, cannot occur unless a program is allowed to execute for a very, 
very long time—typically centuries. In the following, we explain that exclusive 
time receipts allow reconstructing Clochard et al.’s “one-time” integers [5, §3.2], 
which are so named because they are not duplicable, whereas persistent time 
receipts allow reconstructing their “peano” integers [5, §3.2], which are so named 
because they do not support unrestricted addition. 
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Exclusive time receipts. The assertion Zn denotes n time receipts. Like time 
credits, these time receipts are “exclusive”, by which we mean that they are not 
duplicable. The basic laws that govern exclusive time receipts appear in Fig. 2. 
They are the same laws that govern time credits, with two differences. The first 
difference is that time receipts are the dual of time credits: the specification of 
tick, in this case, states that every computation step produces one time receipt.' 
The second difference lies in the last axiom of Fig. 2, which has no analogue in 
Fig. 1, and which we explain below. 

In practice, how do we expect time receipts to be exploited? They can be used 
to prove lower bounds on the execution time of a program: if the Hoare triple 
{True} p {Zn} holds, then the execution of the program p cannot terminate in 
less than n steps. Inspired by Clochard et al. [5], we note that time receipts can 
also be used to prove that certain undesirable events cannot occur in a feasible 
time. This is done as follows. Let N be a fixed integer, chosen large enough 
that a modern processor cannot possibly execute N operations in a feasible 
time.” The last axiom of Fig. 2, Z N +7 False, states that N time receipts imply 
a contradiction. This axiom informally means that we won’t compute for N 
time steps, because we cannot, or because we promise not to do such a thing. 
A consequence of this axiom is that Zn implies n < N: that is, if we have 
observed n time steps, then n must be small. 

Adopting this axiom weakens the guarantee offered by the program logic. A 
Hoare triple {True} p {True} no longer implies that the program p is forever 
safe. Instead, it means that p is (N —1)-safe: the execution of p cannot go wrong 
until at least N — 1 steps have been taken. Because N is very large, for many 
practical purposes, this is good enough. 

How can this axiom be exploited in practice? We hinted above that it can be 
used to prove the absence of certain integer overflows. Suppose that we wish to 
use signed w-bit machine integers as a representation of mathematical integers. 
(For instance, let w be 64.) Whenever we perform an arithmetic operation, such 
as an addition, we must prove that no overflow can occur. This is reflected in 
the specification of the addition of two machine integers: 


{i(r1) = nı * (£2) = ng x —2¥-1 < ny tne < 271} 
add (x1, £2) 
{Ax. u(x) = nı + n2} 


Here, the variables x; denote machine integers, while the auxiliary variables n; 
denote mathematical integers, and the function + is the injection of machine 
integers into mathematical integers. The conjunct —2¥~1 < nı + ng < 2”71 in 
the precondition represents an obligation to prove that no overflow can occur. 


1 For now, we discuss time credits and time receipts separately, which is why we have 
different specifications for tick in either case. They are combined in Sect. 6. 

2 For a specific example, let N be 28°. Clochard et al. note that, even at the rate of one 
billion operations per second, it takes more than 292 years to execute 26? operations. 
On a 64-bit machine, 2°? is also the maximum representable signed integer, plus one. 

3 The connective >+ is an Iris view shift, that is, a transition that can involve a side 
effect on ghost state. 
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Suppose now that the machine integers xı and x2 represent the lengths of 
two disjoint linked lists that we wish to concatenate. To construct each of these 
lists, we must have spent a certain amount of time: as proofs of this work, let 
us assume that the assertions Xn and ¥ng are at hand. Let us further assume 
that the word size w is sufficiently large that it takes a very long time to count 
up to the largest machine integer. That is, let us make the following assumption: 


N<o- (large word size assumption) 


(E.g., with N = 263 and w = 64, this holds.) Then, we can prove that the 
addition of xı and x2 is permitted. This goes as follows. From the separating 
conjunction Xn * Xn2, we get ¥(n1 +n). The existence of these time receipts 
allows us to deduce 0 < nı + n2 < N, which implies 0 < ny + ng < 2®71, Thus, 
the precondition of the addition operation add(a1, £2) is met. 

In summary, we have just verified that the addition of two machine integers 
satisfies the following alternative specification: 


{i(x1) = nı * Enq * (z2) = n2 * Eno} 
add(x1,X2) 
{Ax. (£) = ni + no * E(n1 + n2)} 
This can be made more readable and more abstract by defining a “clock” to be 
a machine integer x accompanied with v(x) time receipts: 


clock(x) = In.(u(x) =n * Xn) 
Then, the above specification of addition can be reformulated as follows: 


{clock(x1) * clock (x2) } 
add(x1, £2) 
{Au. clock(x) * u(x) = (x1) + i(x£2)} 


In other words, clocks support unrestricted addition, without any risk of overflow. 
However, because time receipts cannot be duplicated, neither can clocks: clock (x) 
does not entail clock(x) » clock(x). In other words, a clock is uniquely owned. 
One can think of a clock x as a hard-earned integer: the owner of this clock has 
spent x units of time to obtain it. 

Clocks are a reconstruction of Clochard et al.’s “one-time integers” [5], which 
support unrestricted addition, but cannot be duplicated. Whereas Clochard et 
al. view one-time integers as a primitive concept, and offer a direct paper proof of 
their soundness, we have just reconstructed them in terms of a more elementary 
notion, namely time receipts, and in the setting of a more powerful program 
logic, whose soundness is machine-checked, namely Iris. 
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Persistent time receipts. In addition to exclusive time receipts, it is useful 
to introduce a persistent form of time receipts.4 The axioms that govern both 
exclusive and persistent time receipts appear in Fig. 3. 


* Instead of viewing persistent time receipts as a primitive concept, one could define 
them as a library on top of exclusive time receipts. Unfortunately, this construction 
leads to slightly weaker laws, which is why we prefer to view them as primitive. 
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x:N- iProp — there is such a thing as “n exclusive time receipts” 
x: N > iProp — and “a persistent receipt for n steps” 
timeless(X n) A timeless(Xn) — an Iris technicality 
persistent (X n) — persistent receipts are persistent 
True >+ XO — zero receipts can be created out of thin air 
Z(n1 + n2) =En1 * n — exclusive receipts obey addition 
Kmax(ni,n2) = Xni * Kn2 — persistent receipts obey maximum 
In > En *kKn — taking a snapshot of n exclusive receipts 
yields a persistent receipt for n steps 
XN >+ False — no machine runs for N time steps 
tick : Val — there is a tick pseudo-op 
{Kn} 
tick (v) — tick produces one exclusive receipt, 


{Aw.w =v * X1 * K(n+1)} and can increment an existing persistent receipt 


Fig. 3. The axiomatic interface TRIntf of time receipts 


We write Xn for a persistent receipt, a witness that at least n units of time 
have elapsed. (We avoid the terminology “n persistent time receipts”, in the 
plural form, because persistent time receipts are not additive. We view Kn as 
one receipt whose face value is n.) This assertion is persistent, which in Iris 
terminology means that once it holds, it holds forever. This implies, in particular, 
that it is duplicable: Kn = Kn * Xn. It is created just by observing the existence 
of n exclusive time receipts, as stated by the following axiom, also listed in Fig. 3: 
In >q En * Xn. Intuitively, someone who has access to the assertion Kn is 
someone who knows that n units of work have been performed, even though they 
have not necessarily “personally” performed that work. Because this knowledge 
is not exclusive, the conjunction Kn; * Xna does not entail K(n; + nz). Instead, 
we have the following axiom, also listed in Fig. 3: K(max(n1,n2)) =Kn1 * Kno. 

More subtly, the specification of tick in Fig.3 is stronger than the one in 
Fig. 2. According to this strengthened specification, tick () does not just produce 
an exclusive receipt Z1. In addition to that, if a persistent time receipt Kn is at 
hand, then tick () is able to increment it and to produce a new persistent receipt 
R(n + 1), thus reflecting the informal idea that a new unit of time has just been 
spent. A user who does not wish to make use of this feature can pick n = 0 and 
recover the specification of tick in Fig. 2 as a special case. 

Finally, because Kn means that n steps have been taken, and because we 
promise never to reach N steps, we adopt the axiom X N -+ False, also listed in 
Fig. 3. It implies the earlier axiom ¥ N -+ False, which is therefore not explicitly 
shown in Fig. 3. 

In practice, how are persistent time receipts exploited? By analogy with 
clocks, let us define a predicate for a machine integer x accompanied with (x) 
persistent time receipts: 


A 


snapclock(x) = An.(u(a) =n * Kn) 
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By construction, this predicate is persistent, therefore duplicable: 
snapclock(x) = snapclock(x) * snapclock (a) 


We refer to this concept as a “snapclock”, as it is not a clock, but can be thought 
of as a snapshot of some clock. Thanks to the axiom Zk }7 Zk * Kk, we have: 


clock(x) + clock(x) * snapclock(x) 


Furthermore, snapclocks have the valuable property that, by performing just 
one step of extra work, a snapclock can be incremented, yielding a new snapclock 
that is greater by one. That is, the following Hoare triple holds: 


{ snapclock (x) } 
tick (); add(a, 1) 
{Ax’. snapclock(ax’) * I(x’) = L(x) + 1} 


The proof is not difficult. Unfolding snapclock(x) in the precondition yields Kn, 
where I(x) = n. As per the strengthened specification of tick, the execution of 
tick () then yields ¥1 * R(n+1). As in the case of clocks, the assertion X(n + 1) 
implies 0 <n+1 < 2”7!, which means that no overflow can occur. Finally, ¥1 
is thrown away and K(n+1) is used to justify snapclock(«’) in the postcondition. 
Adding two arbitrary snapclocks x; and 2 is illegal: from the sole assumption 
snapclock(x1) x snapclock(x2), one cannot prove that the addition of x, and x2 
won’t cause an overflow, and one cannot prove that its result is a valid snapclock. 
However, snapclocks do support a restricted form of addition. The addition of 
two snapclocks x; and x2 is safe, and produces a valid snapclock x, provided it 
is known ahead of time that its result is less than some preexisting snapclock y: 


{snapclock(x1) * snapclock(a2) * u(a1 + £2) < L(y) * snapclock(y)} 
add(x1, £2) 
{\x. snapclock(x) * u(x) = (21) + t(x2)} 


Snapclocks are a reconstruction of Clochard et al.’s “peano integers” [5], which 
are so named because they do not support unrestricted addition. Clocks and 
snapclocks represent different compromises: whereas clocks support addition but 
not duplication, snapclocks support duplication but not addition. They are useful 
in different scenarios: as a rule of thumb, if an integer counter is involved in the 
implementation of a mutable data structure, then one should attempt to view it 
as a Clock; if it is involved in the implementation of a persistent data structure, 
then one should attempt to view it as a snapclock. 


3 HeapLang and the Tick Translation 


In the next section (Sect.4), we extend Iris with time credits, yielding a new 
program logic Iris*. We do this without modifying Iris. Instead, we compose 
Iris with a program transformation, the “tick translation”, which inserts tick() 
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instructions into the code in front of every computation step. In the construction 
of Iris¥, our extension of Iris with time receipts, the tick translation is exploited 
in a similar way (Sect. 5). In this section, we define the tick translation and state 
some of its properties. 

Iris is a generic program logic: it can be instantiated with an arbitrary cal- 
culus for which a small-step operational semantics is available [12]. Ideally, our 
extension of Iris should take place at this generic level, so that it, too, can be 
instantiated for an arbitrary calculus. Unfortunately, it seems difficult to define 
the tick translation and to prove it correct in a generic manner. For this rea- 
son, we choose to work in the setting of HeapLang [12], an untyped A-calculus 
equipped with Booleans, signed machine integers, products, sums, recursive func- 
tions, references, and shared-memory concurrency. The three standard opera- 
tions on mutable references, namely allocation, reading, and writing, are avail- 
able. A compare-and-set operation CAS (e1, e2, e3) and an operation for spawning 
a new thread are also provided. As the syntax and operational semantics of Hea- 
pLang are standard and very much irrelevant in this paper, we omit them. They 
appear in our online repository [17]. 

The tick translation transforms a HeapLang expression e to a HeapLang 
expression ((€)) zicg. It is parameterized by a value tick. Its effect is to insert a 
call to tick in front of every operation in the source expression e. The translation 
of a function application, for instance, is as follows: 


((e1 (€2))) tick = tick (Ke1) tick) (Ke2) tick) 


For convenience, we assume that tick can be passed an arbitrary value v as an 
argument, and returns v. Because evaluation in HeapLang is call-by-value and 
happens to be right-to-left°, the above definition means that, after evaluating 
the argument ((e2)) tick and the function ((e1)) tick, we invoke tick, then carry on 
with the function call. This translation is syntactically well-behaved: it preserves 
the property of being a value, and commutes with substitution. This holds for 
every value tick. 


tick, £ rec self (x) = 
letk=!cin 
if k = 0 then oops () 
else if CAS(c,k,k — 1) then else self (x) 


Fig. 4. Implementation of tick, in HeapLang 


As far the end user is concerned, tick remains abstract (Sect. 2). Yet, in our 
constructions of Iris* and Iris, we must provide a concrete implementation of 
it in HeapLang. This implementation, named tick,, appears in Fig.4. A global 


5 If HeapLang used left-to-right evaluation, the definition of the translation would be 
slightly different, but the lemmas that we prove would be the same. 
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integer counter c stores the number of computation steps that the program is still 
allowed to take. The call tick, () decrements a global counter c, if this counter 
holds a nonzero value, and otherwise invokes oops (). 

At this point, the memory location c and the value oops are parameters. 

We stress that tick, plays a role only in the proofs of soundness of Iris and 
Irisž. It is never actually executed, nor is it shown to the end user. 

Once tick is instantiated with tick., one can prove that the translation is 
correct in the following sense: the translated code takes the same computation 
steps as the source code and additionally keeps track of how many steps are 
taken. More specifically, if the source code can make n computation steps, and 
if c is initialized with a value m that is sufficiently large (that is, m > n), then 
the translated code can make n computation steps as well, and c is decremented 
from m to m — n in the process. 


Lemma 1 (Reduction Preservation). Assume there is a reduction sequence: 
(T1,01) >p (Tb, 09) 


Assume c is fresh for this reduction sequence. Let m > n. Then, there exists a 
reduction sequence: 


(KT), (or) le m]) >i (KT), (a2) [e — m — n]) 


In this statement, the metavariable T stands for a thread pool, while o stands 
for a heap. The relation —>tp is HeapLang’s “threadpool reduction”. For the sake 
of brevity, we write just ((e)) for (ericka, that is, for the translation of the 
expression e, where tick is instantiated with ticke. This notation is implicitly 
dependent on the parameters c and oops. 

The above lemma holds for every choice of oops. Indeed, because the counter c 
initially holds the value m, and because we have m > n, the counter is never 
about to fall below zero, so oops is never invoked. 

The next lemma also holds for every choice of oops. It states that if the 
translated program is safe and if the counter c has not yet reached zero then the 
source program is not just about to crash. 


Lemma 2 (Immediate Safety Preservation). Assume c is fresh for e. Let 
m > 0. If the configuration (((e)), {o} [ec m]) is safe, then either e is a value 
or the configuration (e,o) is reducible. 


By combining Lemmas 1 and 2 and by contraposition, we find that safety is 
preserved backwards, as follows: if, when the counter c is initialized with m, the 
translated program ((e)) is safe, then the source program e is m-safe. 


Lemma 3 (Safety Preservation). If for every location c the configuration 
((T)), (a)) [ce m]) is safe, then the configuration (T,o) is m-safe. 
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4 Iris with Time Credits 


The authors of Iris [12] have used Coq both to check that Iris is sound and to 
offer an implementation of Iris that can be used to carry out proofs of programs. 
The two are tied: if {True} p {True} can be established by applying the proof 
rules of Iris, then one gets a self-contained Coq proof that the program p is safe. 

In this section, we temporarily focus on time credits and explain how we 
extend Iris with time credits, yielding a new program logic Iris*. The new logic 
is defined in Coq and still offers an end-to-end guarantee: if {$k} p {True} can 
be established in Coq by applying the proof rules of Iris*, then one has proved 
in Coq that p is safe and runs in at most k steps. 

To define Iris*, we compose Iris with the tick translation. We are then able to 
argue that, because this program transformation is operationally correct (that is, 
it faithfully accounts for the passing of time), and because Iris is sound (that is, it 
faithfully approximates the behavior of programs), the result of the composition 
is a sound program logic that is able to reason about time. 

In the following, we view the interface TCIntf as explicitly parameterized 
over $ and tick. Thus, we write “TCIntf ($) tick” for the separating conjunction 
of all items in Fig. 1 except the declarations of $ and tick. 

We require the end user, who wishes to perform proofs of programs in Iris, 
to work with Iris® triples, which are defined as follows: 


Definition 1 (Iris? triple). An Iris triple {P}  {®} is syntactic sugar for: 
V($:N— iProp) Ytick TCIntf ($) tick — {P} ((e)) tick {8} 


Thus, an Iris® triple is in reality an Iris triple about the instrumented expression 
(erick. While proving this Iris triple, the end user is given an abstract view 
of the predicate $ and the instruction tick. He does not have access to their 
concrete definitions, but does have access to the laws that govern them. 

We prove that Iris* is sound in the following sense: 


Theorem 1 (Soundness of Iris*). If {$n} e {True}g holds, then the machine 
configuration (e,@), where Ø is the empty heap, is safe and terminates in at 
most n steps. 


In other words, a program that is initially granted n time credits cannot run 
for more than n steps. To establish this theorem, we proceed roughly as follows: 


1. we provide a concrete definition of tick; 

2. we provide a concrete definition of $ and prove that TCIntf ($) tick holds; 

3. this yields {$n} (e)) rex {True}; from this and from the correctness of the tick 
translation, we deduce that e cannot crash or run for more than n steps. 


Step 1. Our first step is to provide an implementation of tick. As announced 
earlier (Sect. 3), we use tick, (Fig.4). We instantiate the parameter oops with 
crash, an arbitrary function whose application is unsafe. (That is, crash is chosen 
so that crash () reduces to a stuck term.) For the moment, c remains a parameter. 
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With these concrete choices of tick and oops, the translation transforms an 
out-of-time-budget condition into a hard crash. Because Iris forbids crashes, 
Iris®, which is the composition of the translation with Iris, will forbid out-of- 
time-budget conditions, as desired. 

For technical reasons, we need two more lemmas about the translation, whose 
proofs rely on the fact that oops is instantiated with crash. They are slightly 
modified or strengthened variants of Lemmas 2 and 3. First, if the source code 
can take one step, then the translated code, supplied with zero budget, crashes. 
Second, if the translated code, supplied with a runtime budget of m, does not 
crash, then the source code terminates in at most m steps. 


Lemma 4 (Credit Exhaustion). Suppose the configuration (T, o) is reducible. 
Then, for all c, the configuration (((T)), Ko} [ce —0]) is unsafe. 


Lemma 5 (Safety Preservation, Strengthened). If for every location c the 
configuration ((T)), Ko} [ce —m]) is safe, then (T,o) is safe and terminates in 
at most m steps. 


Step 2. Our second step, roughly, is to exhibit a definition of $ : N — iProp 
such that TCIntf ($) tick. is satisfied. That is, we would like to prove something 
along the lines of: 3($ : N— iProp) TCIntf ($) tick.. However, these informal 
sentences do not quite make sense. This formula is not an ordinary proposition: 
it is an Iris assertion, of type iProp. Thus, it does not make sense to say that 
this formula “is true” in an absolute manner. Instead, we prove in Iris that we 
can make this assertion true by performing a view shift, that is, a number of 
operations that have no runtime effect, such as allocating a ghost location and 
imposing an invariant that ties this ghost state with the physical state of the 
counter c. This is stated as follows: 


Lemma 6 (Time Credit Initialization). For every c and n, the following 
Tris view shift holds: 


(chen) Sy A($:N— iProp) (TCIntf ($) tick, * $n) 


In this statement, on the left-hand side of the view shift symbol, we find 
the “points-to” assertion c + n, which represents the unique ownership of the 
memory location c and the assumption that its initial value is n. This assertion 
no longer appears on the right-hand side of the view shift. This reflects the fact 
that, when the view shift takes place, it becomes impossible to access c directly; 
the only way of accessing it is via the operation tick. 

On the right-hand side of the view shift symbol, beyond the existential quan- 
tifier, we find a conjunction of the assertion TCIntf ($) tick., which means that 
the laws of time credits are satisfied, and $n, which means that there are initially 
n time credits in existence. 

In the interest of space, we provide only a brief summary of the proof 
of Lemma 6; the reader is referred to the extended version of this paper [18, 
Appendix A] for more details. In short, the assertion $1 is defined in such a way 
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that it represents an exclusive contribution of one unit to the current value of 
the global counter c. In other words, we install the following invariant: at every 
time, the current value of c is (at least) the sum of all time credits in existence. 
Thus, the assertion $1 guarantees that c is nonzero, and can be viewed as a 
permission to decrement c by one. This allows us to prove that the specification 
of tick in Fig. 1 is satisfied by our concrete implementation tick,. In particular, 
tick. cannot cause a crash: indeed, under the precondition $1, c is not in danger 
of falling below zero, and crash () is not executed—it is in fact dead code. 


Step 3. In the last reasoning step, we complete the proof of Theorem1. The 
proof is roughly as follows. Suppose the end user has established {$n} e {True}g. 
By Safety Preservation, Strengthened (Lemma 5), to prove that (e, @) is safe and 
runs in at most n steps, it suffices to show (for an arbitrary location c) that the 
translated expression ((e)), executed in the initial heap Ø [cn], is safe. To do 
so, beginning with this initial heap, we perform Time Credit Initialization, that 
is, we execute the view shift whose statement appears in Lemma6. This yields 
an abstract predicate $ as well as the assertions TCIntf ($) tick and $n. At 
this point, we unfold the Iris? triple {$n} e {True}s, yielding an implication (see 
Definition 1), and apply it to $, to tick., and to the hypothesis TCIntf ($) tick. 
This yields the Iris triple {$n} (e)) {True}. Because we have $n at hand and 
because Iris is sound [12], this implies that ((e)) is safe. This concludes the proof. 

This last step is, we believe, where the modularity of our approach shines. 
Iris’ soundness theorem is re-used as a black box, without change. In fact, any 
program logic other than Iris could be used as a basis for our construction, as 
along as it is expressive enough to prove Time Credit Initialization (Lemma 6). 
The last ingredient, Safety Preservation, Strengthened (Lemma 5), involves only 
the operational semantics of HeapLang, and is independent of Iris. 

This was just an informal account of our proof. For further details, the reader 
is referred to the online repository [17]. 


5 Iris with Time Receipts 


In this section, we extend Iris with time receipts and prove the soundness of 
the new logic, dubbed Iris¥. To do so, we follow the scheme established in the 
previous section (Sect. 4), and compose Iris with the tick translation. 

From here on, let us view the interface of time receipts as parameterized 
over F, X, and tick. Thus, we write “TRIntf (Z) (8) tick” for the separating 
conjunction of all items in Fig. 3 except the declarations of Z, X, and tick. 

As in the case of credits, the user is given an abstract view of time receipts: 


Definition 2 (Iris® triple). An Iris® triple {P}e{®}g is syntactic sugar for: 
V(Z, X : N —> iProp) Vtick TRIntf (Z) (R) tick -x {P} ((e)) tick {8} 


Theorem 2 (Soundness of Irisž). If {True} e {True}g holds, then the machine 
configuration (e, Ø) is (N — 1)-safe. 
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As indicated earlier, we assume that the end user is interested in proving 
that crashes cannot occur until a very long time has elapsed, which is why we 
state the theorem in this way. Whereas an Iris triple {True} e {True} guarantees 
that e is safe, the Iris¥ triple {True} e {True}y guarantees that it takes at least 
N — 1 steps of computation for e to crash. In this statement, N is the global 
parameter that appears in the axiom KN >- False (Fig.3). Compared with 
Iris, Iris’ provides a weaker safety guarantee, but offers additional reasoning 
principles, leading to increased convenience and modularity. 

In order to establish Theorem 2, we again proceed in three steps: 


1. provide a concrete definition of tick; 
2. provide concrete definitions of ¥, and prove that TRIntf (Z) (R) tick holds; 
3. from {True} ((e)) ticx {True}, deduce that e is (N — 1)-safe. 


Step 1. In this step, we keep our concrete implementation of tick, namely tick, 
(Fig. 4). One difference with the case of time credits, though, is that we plan to 
initialize c with N — 1. Another difference is that, this time, we instantiate the 
parameter oops with loop, where loop () is an arbitrary divergent term.’ 


Step 2. The next step is to prove that we are able to establish the time receipt 
interface. We prove the following: 


Lemma 7 (Time Receipt Initialization). For every location c, the following 
Tris view shift holds: 


(e= N-1) >37 A®,R:N—- iProp) TRIntf (2) (8) tick, 


We provide only a brief summary of the proof of Lemma 7; for further details, 
the reader is referred to the extended version of this paper [18, Appendix B]. 
Roughly speaking, we install the invariant that c holds N —1— i, where i is some 
number that satisfies 0 < i < N. We define Zn as an exclusive contribution of n 
units to the current value of i, and define Kn as an observation that 7 is at least 
n. (i grows with time, so such an observation is stable.) As part of the proof of 
the above lemma, we check that the specification of tick holds: 


{Kn} tick (v) {Aw.w =v * Z1 * K(n+1)} 


In contrast with the case of time credits, in this case, the precondition Kn does 
not guarantee that c holds a nonzero value. Thus, it is possible for tick() to 
be executed when c is zero. This is not a problem, though, because loop() is 
safe to execute in any situation: it satisfies the Hoare triple {True} loop() {False}. 
In other words, when c is about to fall below zero and therefore the invariant 
i < N seems about to be broken, loop () saves the day by running away and 
never allowing execution to continue normally. 


6 If the user instead wishes to establish a lower bound on a program’s execution time, 
this is possible as well. 

T In fact, it is not essential that loop() diverges. What matters is that loop satisfy 
the Iris triple {True} loop() {False}. A fatal runtime error that Iris does not rule out 
would work just as well, as it satisfies the same specification. 
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Step 3. In the last reasoning step, we complete the proof of Theorem 2. Suppose 
the end user has established {True} e {True}g. By Safety Preservation (Lemma 3), 
to prove that (e, Ø) is (N—1)-safe, it suffices to show (for an arbitrary location c) 
that ((e)), executed in the initial heap Ø [c< N — 1], is safe. To do so, beginning 
with this initial heap, we perform Time Receipt Initialization, that is, we execute 
the view shift whose statement appears in Lemma 7. This yields two abstract 
predicates ¥ and X as well as the assertion TRIntf (Z) (X) tick. At this point, we 
unfold {True} e {True} (see Definition 2), yielding an implication, and apply this 
implication, yielding the Iris triple {True} ((e)) {True}. Because Iris is sound [12], 
this implies that ((e)) is safe. This concludes the proof. For further detail, the 
reader is again referred to our online repository [17]. 


6 Marrying Time Credits and Time Receipts 


It seems desirable to combine time credits and time receipts in a single program 
logic, Iris*¥. We have done so [17]. In short, following the scheme of Sects. 4 
and 5, the definition of Iris? involves composing Iris with the tick translation. 
This time, tick serves two purposes: it consumes one time credit and produces 
one exclusive time receipt (and increments a persistent time receipt). Thus, its 
specification is as follows: 


{$1 * Kn} tick (v) {Aw.w =v x ¥1 * R(n+1)} 


Let us write TCTRIntf ($) (Z) (R) tick for the combined interface of time credits 
and time receipts. This interface combines all of the axioms of Figs. 1 and 3, but 
declares a single tick function® and proposes a single specification for it, which 
is the one shown above. 


Definition 3 (Iris? triple). An Iris*¥ triple {P} e{D}gx stands for: 
V ($) (Z) (R) tick TCTRIntf ($) (E) (R) tick = {P} (eC) tick {P} 


Theorem 3 (Soundness of Iris). If {$n} e {True}s g holds then the machine 
configuration (e,@) is (N — 1)-safe. If furthermore n < N holds, then this 
machine configuration terminates in at most n steps. 


Iris®* allows exploiting time credits to prove time complexity bounds and, 
at the same time, exploiting time receipts to prove the absence of certain integer 
overflows. Our verification of Union-Find (Sect. 8) illustrates these two aspects. 

Guéneau et al. [7] use time credits to reason about asymptotic complexity, 
that is, about the manner in which a program’s complexity grows as the size 
of its input grows towards infinity. Does such asymptotic reasoning make sense 
in Iris, where no program is ever executed for N time steps or beyond? It 


8 Even though the interface provides only one tick function, it gets instantiated in the 
soundness theorem with different implementations depending on whether there are 
more than N time credits or not. 
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seems to be the case that if a program p satisfies the triple {$n} p {P}gy, then 
it also satisfies the stronger triple {$min(n, N)} p {®}gx, therefore also satisfies 
{$N} p {®}s z. Can one therefore conclude that p has “constant time complexity”? 
We believe not. Provided N is considered a parameter, as opposed to a constant, 
one cannot claim that “N is O(1)”, so {$min(n, N)} p {@}gy does not imply 
that “p runs in constant time”. In other words, a universal quantification on N 
should come after the existential quantifier that is implicit in the O notation. We 
have not yet attempted to implement this idea; this remains a topic for further 
investigation. 


7 Application: Thunks in Iris® 


In this section, we illustrate the power of Iris? by constructing an implementation 
of thunks as a library in Iris’. A thunk, also known as a suspension, is a very 
simple data structure that represents a suspended computation. There are two 
operations on thunks, namely create, which constructs a new thunk, and force, 
which demands the result of a thunk. A thunk memoizes its result, so that even 
if it is forced multiple times, the computation only takes place once. 

Okasaki [19] proposes a methodology for reasoning about the amortized time 
complexity of computations that involve shared thunks. For every thunk, he 
keeps track of a debit, which can be thought of as an amount of credit that one 
must still pay before one is allowed to force this thunk. A ghost operation, pay, 
changes one’s view of a thunk, by reducing the debit associated with this thunk. 
force can be applied only to a zero-debit thunk, and has amortized cost O(1). 
Indeed, if this thunk has been forced already, then force really requires constant 
time; and if this thunk is being forced for the first time, then the cost of perform- 
ing the suspended computation must have been paid for in advance, possibly in 
several installments, via pay. This discipline is sound even in the presence of 
sharing, that is, of multiple pointers to a thunk. Indeed, whereas duplicating 
a credit is unsound, duplicating a debit leads to an over-approximation of the 
true cost, hence is sound. Danielsson [6] formulates Okasaki’s ideas as a type 
system, which he proves sound in Agda. Pilkiewicz and Pottier [20] reconstruct 
this type discipline in the setting of a lower-level type system, equipped with 
basic notions of time credits, hidden state, and monotonic state. Unfortunately, 
their type system is presented in an informal manner and does not come with a 
proof of type soundness. 

We reproduce Pilkiewicz and Pottier’s construction in the formal setting of 
Iris*. Indeed, Iris* offers all of the necessary ingredients, namely time credits, 
hidden state (invariants, in Iris terminology) and monotonic state (a special case 
of Iris’ ghost state). Our reconstruction is carried out inside Coq [17]. 


7.1 Concurrency and Reentrancy 


One new problem that arises here is that Okasaki’s analysis, which is valid in a 
sequential setting, potentially becomes invalid in a concurrent setting. Suppose 
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we wish to allow multiple threads to safely share access to a thunk. A natural, 
simple-minded approach would be to equip every thunk with a lock and allow 
competition over this lock. Then, unfortunately, forcing would become a blocking 
operation: one thread could waste time waiting for another thread to finish 
forcing. In fact, in the absence of a fairness assumption about the scheduler, 
an unbounded amount of time could be wasted in this way. This appears to 
invalidate the property that force has amortized cost O(1). 

Technically, the manner in which this problem manifests itself in Iris* is in 
the specification of locks. Whereas in Iris a spin lock can be implemented and 
proved correct with respect to a simple and well-understood specification [2], in 
Iris, it cannot. The lock() method contains a potentially infinite loop: therefore, 
no finite amount of time credits is sufficient to prove that lock() is safe. This 
issue is discussed in greater depth later on (Sect. 9). 

A distinct yet related problem is reentrancy. Arguably, an implementation 
of thunks should guarantee that a suspended computation is evaluated at most 
once. This guarantee seems particularly useful when the computation has a side 
effect: the user can then rely on the fact that this side effect occurs at most 
once. However, this property does not naturally hold: in the presence of heap- 
allocated mutable state, it is possible to construct an ill-behaved “reentrant” 
thunk which, when forced, attempts to recursively force itself. Thus, something 
must be done to dynamically reject or statically prevent reentrancy. In Pilkiewicz 
and Pottier’s code [20], reentrancy is detected at runtime, thanks to a three-color 
scheme, and causes a fatal runtime failure. In a concurrent system where each 
thunk is equipped with a lock, reentrancy is also detected at runtime, and turned 
into deadlock; but we have explained earlier why we wish to avoid locks. 

Fortunately, Iris provides us with a static mechanism for forbidding both con- 
currency and reentrancy. We introduce a unique token ź, which can be thought 
of as “permission to use the thunk API’, and set things up so that pay and 
force require and return ź. This forbids concurrency: two operations on thunks 
cannot take place concurrently. Furthermore, when a user-supplied suspended 
computation is executed, the token # is not transmitted to it. This forbids reen- 
trancy.? The implementation of this token relies on Iris’ “nonatomic invariants” 
(Sect. 7.4). With these restrictions, we are able to prove that Okasaki’s discipline 
is sound. 


7.2 Implementation of Thunks 


A simple implementation of thunks in HeapLang appears in Fig. 5. A thunk can 
be in one of two states: White f and Black v. A white thunk is unevaluated: 


° Therefore, a suspended computation cannot force any thunk. This is admittedly a 
very severe restriction, which rules out many useful applications of thunks. In fact, 
we have implemented a more flexible discipline, where thunks can be grouped in 
multiple “regions” and there is one token per region instead of a single global / 
token. This discipline allows concurrent or reentrant operations on provably distinct 
thunks, yet can still be proven sound. 
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create = Af. ref ( White f) 
force £ At. match !t with 
White f > let v = f () in t + Black v ; v 
| Black v => v 
end 


Fig. 5. An implementation of thunks 


isThunk : Loc + N > ( Val —> iProp) > iProp — there exist “thunks” 


persistent(isThunk t n 8) — thunks can be shared 
nı < N2 — — it is sound to 
isThunk t nı  —= isThunk t no @ overestimate a debt 
ź : iProp — there exist “thunderbolts” 
4 — the user is handed one 
{$3 = {$n} (f OX {2h — a computation of cost n 
(create (f))) gives rise to an n-debit thunk; 
{At. isThunk t n D} the cost is O(1) 
(Vu. duplicable(® v)) —* 
{$11 * isThunk t0 @ x 4} — a 0-debit thunk can be forced; 
(force (t))) the thunderbolt is required; 
{Av.B v x F} the cost is O(1) 
isThunk tn Ø x $k * 2 
>- isThunk t (n—k)@ x # — paying reduces one’s debt 


Fig. 6. A simple specification of thunks in Iris? 


the function f represents a suspended computation. A black thunk is evaluated: 
the value v is the result of the computation that has been performed already. 
Two colors are sufficient: because our static discipline rules out reentrancy, there 
is no need for a third color, whose purpose would be to dynamically detect an 
attempt to force a thunk that is already being forced. 


7.3 Specification of Thunks in Iris® 


Our specification of thunks appears in Fig. 6. It declares an abstract predicate 
isThunk ¢ n ®, which asserts that ¢ is a valid thunk, that the debt associated 
with this thunk is n, and that this thunk (once forced) produces a value that 
satisfies the postcondition . The number n, a debit, is the number of credits 
that remain to be paid before this thunk can be forced. The postcondition ® 
is chosen by the user when a thunk is created. It must be duplicable (this is 
required in the specification of force) because force can be invoked several times 
and we must guarantee, every time, that the result v satisfies ® v. 

The second axiom states that isThunk ¢ n @ is a persistent assertion. This 
means that a valid thunk, once created, remains a valid thunk forever. Among 
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other things, it is permitted to create two pointers to a single thunk and to 
reason independently about each of these pointers. 

The third axiom states that isThunk ¢ n ® is covariant in its parameter n. 
Overestimating a debt still leads to a correct analysis of a program’s worst-case 
time complexity. 

Next, the specification declares an abstract assertion 7, and provides the user 
with one copy of this assertion. We refer to it as “the thunderbolt”. 

The next item in Fig.6 is the specification of create. It is higher-order: the 
precondition of create contains a specification of the function f that is passed 
as an argument to create. This axiom states that, if f represents a computa- 
tion of cost n, then create (f) produces an n-debit thunk. The cost of creation 
itself is 3 credits. This specification is somewhat simplistic, as it does not allow 
the function f to have a nontrivial precondition. It is possible to offer a richer 
specification; we eschew it in favor of simplicity. 

Next comes the specification of force. Only a 0-debit thunk can be forced. The 
result is a value v that satisfies ®. The (amortized) cost of forcing is 11 credits. 
The thunderbolt appears in the pre- and postcondition of force, forbidding any 
concurrent attempts to force a thunk. 

The last axiom in Fig. 6 corresponds to pay. It is a view shift, a ghost oper- 
ation. By paying k credits, one turns an n-debit thunk into an (n — k)-debit 
thunk. At runtime, nothing happens: it is the same thunk before and after the 
payment. Yet, after the view shift, we have a new view of the number of debits 
associated with this thunk. Here, paying requires the thunderbolt. It should be 
possible to remove this requirement; we have not yet attempted to do so. 


7.4 Proof of Thunks in Iris® 


After implementing thunks in HeapLang (Sect. 7.2) and expressing their speci- 
fication in Iris* (Sect. 7.3), there remains to prove that this specification can be 
established. We sketch the key ideas of this proof. 

Following Pilkiewicz and Pottier [20], when a new thunk is created, we install 
a new Iris invariant, which describes this thunk. The invariant is as follows: 


ThunkInv ty ne ® £ 
is Ca i tee tis White f x {nc} f () {9} + =) 


= v. tr Black v 


yis a ghost location, which we allocate at the same time as the thunk t. It holds ele- 
ments of the authoritative monoid AUTH(N, max) [12]. The variable nc, for “nec- 
essary credits”, is the cost of the suspended computation: it appears in the precon- 
dition of f. The variable ac, for “available credits”, is the number of credits that 
have been paid so far. The disjunction inside the invariant states that: 


— either the thunk is white, in which case we have ac credits at hand; 
— or the thunk is black, in which case we have no credits at hand, as they have 
been spent already. 
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The predicate isThunk ¢ n ® is then defined as follows: 
isThunk tn 8 £ 


dy, ne. (o (nc — n) 7 x Nalnv( ThunkInv t y nc )) 


ee Ne ee, 


o(nce— n) i? inside isThunk t n ®, confronted 


Ua see e, a Ea J 
with the authoritative assertion re aci” that can be obtained by acquiring the 
invariant, implies the inequality ne — n < ac, therefore nc < ac+ n. That is, the 
credits paid so far (ac) plus the credits that remain to be paid (n) are sufficient 
to cover for the actual cost of the computation (nc). In particular, in the proof 
of force, we have a 0-debit thunk, so nc < ac holds. In the case where the thunk 
is white, this means that the ac credits that we have at hand are sufficient to 
justify the call f (), which requires nc credits. 

The final aspect that remains to be explained is our use of Nalnv(---), an Iris 
‘“nonatomic invariant”. Indeed, in this proof, we cannot rely on Iris’ primitive 
invariants. A primitive invariant can be acquired only for the duration of an 
atomic instruction [12]. In our implementation of thunks (Fig.5), however, we 
need a “critical section” that encompasses several instructions. That is, we must 
acquire the invariant before dereferencing t, and (in the case where this thunk is 
white) we cannot release it until we have marked this thunk black. Fortunately, 
Iris provides a library of “nonatomic invariants” for this very purpose. (This 
library is used in the RustBelt project [10] to implement Rust’s type Cell.) This 
library offers separate ghost operations for acquiring and releasing an invariant. 
Acquiring an invariant consumes a unique token, which is recovered when the 
invariant is released: this guarantees that an invariant cannot be acquired twice, 
or in other words, that two threads cannot be in a critical section at the same 
time. The unique token involved in this protocol is the one that we expose to 
the end user as “the thunderbolt”. 


8 Application: Union-Find in Iris** 


As an illustration of the use of both time credits and time receipts, we formally 
verify the functional correctness and time complexity of an implementation of 
the Union-Find data structure. Our proof |17] is based on Charguéraud and 
Pottier’s work [4]. We port their code from OCaml to HeapLang, and port their 
proof from Separation Logic with Time Credits to Iris*®. At this point, the proof 
exploits just Iris*, a subset of Iris*¥. The mathematical analysis of Union-Find, 
which represents a large part of the proof, is unchanged. Our contribution lies in 
the fact that we modify the data structure to represent ranks as machine integers 
instead of unbounded integers, and exploit time receipts in Iris*® to establish the 
absence of overflow. We equip HeapLang with signed machine integers whose bit 
width is a parameter w. Under the hypothesis loglog N < w — 1, we are able 
to prove that, even though the code uses limited-width machine integers, no 
overflow can occur in a feasible time. If for instance N is 26°, then this condition 
boils down to w > 7. Ranks can be stored in just 7 bits without risking overflow. 
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As in Charguéraud and Pottier’s work, the Union-Find library advertises 
an abstract representation predicate isUF D RV, which describes a well-formed, 
uniquely-owned Union-Find data structure. The parameter D, a set of nodes, is 
the domain of the data structure. The parameter R, a function, maps a node 
to the representative element of its equivalence class. The parameter V, also a 
function, maps a node to a payload value associated with its equivalence class. 
We do not show the specification of every operation. Instead, we focus on union, 
which merges two equivalence classes. We establish the following Iris*¥ triple: 


{isUF DRV x $(44a(|D|) + 152)} 
loglogN < w-—1 : 
a; = union (x,y) 
/ t 
yED faz isUF D R' V’ x \ 
z= R(x) V z = RY) J gy 


where the functions R’ and V’ are defined as follows:!° 


(Ri(w),V'(w)) = K V2) if Rw) = R(x) or R(w) = RW) 
(R(w), V(w)) otherwise 

The hypotheses x € D and y € D and the conjunct isUF DRV in the 
precondition require that x and y be two nodes in a valid Union-Find data 
structure. The postcondition Az. ... describes the state of the data structure 
after the operation and the return value z. 

The conjunct $(44a(|D|) + 152) in the precondition indicates that union has 
time complexity O(a(n)), where a is an inverse of Ackermann’s function and 
n is the number of nodes in the data structure. This is an amortized bound; 
the predicate isUF also contains a certain number of time credits, known as the 
potential of the data structure, which are used to justify union operations whose 
actual cost exceeds the advertised cost. The constants 44 and 152 differ from 
those found in Charguéraud and Pottier’s specification [4] because Iris counts 
every computation step, whereas they count only function calls. Abstracting 
these constants by using O notation, as proposed by Guéneau et al. [7], would 
be desirable, but we have not attempted to do so yet. 

The main novelty, with respect to Charguéraud and Pottier’s specification, 
is the hypothesis log log N < w — 1, which is required to prove that no overflow 
can occur when the rank of a node is incremented. In our proof, N and w are 
parameters; once their values are chosen, this hypothesis is easily discharged, 
once and for all. In the absence of time receipts, we would have to publish the 
hypothesis loglogn < w — 1, where n is the cardinal of D, forcing every (direct 
and indirect) user of the data structure to keep track of this requirement. 

For the proof to go through, we store n time receipts in the data structure: 
that is, we include the conjunct Xn, where n stands for |D], in the definition of 
the invariant isUF D RV. The operation of creating a new node takes at least one 


10 This definition of R’ and V’ has free variables x, y, z, therefore in reality must appear 
inside the postcondition. Here, it is presented separately, for greater readability. 
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step, therefore produces one new time receipt, which is used to prove that the 
invariant is preserved by this operation. At any point, then, from the invariant, 
and from the basic laws of time receipts, we can deduce that n < N holds. 
Furthermore, it is easy to show that a rank is at most logn. Therefore, a rank 
is at most log N. In combination with the hypothesis loglog N < w — 1, this 
suffices to prove that a rank is at most 2”~' — 1, the largest signed machine 
integer, and therefore that no overflow can occur in the computation of a rank. 

Clochard et al. [5, §2] already present Union-Find as a motivating example 
among several others. They write that “there is obviously no danger of arithmetic 
overflow here, since [ranks] are only obtained by successive increments by one”. 
This argument would be formalized in their system by representing ranks as 
either “one-time” or “peano” integers (in our terminology, clocks or snapclocks). 
This argument could be expressed in Iris*¥, but would lead to requiring log N < 
w — 1. In contrast, we use a more refined argument: we note that ranks are 
logarithmic in n, the number of nodes, and that n itself can never overflow. This 
leads us to the much weaker requirement loglog N < w — 1, which means that 
a rank can be stored in very few bits. We believe that this argument cannot be 
expressed in Clochard et al.’s system. 


9 Discussion 


One feature of Iris and HeapLang that deserves further discussion is concur- 
rency. Iris is an evolution of Concurrent Separation Logic, and HeapLang has 
shared-memory concurrency. How does this impact our reasoning about time? 
At a purely formal level, this does not have any impact: Theorems 1, 2, 3 and 
their proofs are essentially oblivious to the absence or presence of concurrency 
in the programming language. At a more informal level, though, this impacts 
our interpretation of the real-world meaning of these theorems. Whereas in a 
sequential setting a “number of computation steps” can be equated (up to a 
constant factor) with “time”, in a concurrent setting, a “number of computation 
steps” is referred to as “work”, and is related to “time” only up to a factor of p, 
the number of processors. In short, our system measures work, not time. The 
number of available processors should be taken into account when choosing a 
specific value of N: this value must be so large that N computation steps are 
infeasible even by p processors. With this in mind, we believe that our system 
can still be used to prove properties that have physical relevance. 

In short, our new program logics, Iris*, Irisž, and Iris, tolerate concurrency. 
Yet, is it fair to say that they have “good support” for reasoning about concur- 
rent programs? We believe not yet, and this is an area for future research. The 
main open issue is that we do not at this time have good support for reason- 
ing about the time complexity of programs that perform busy-waiting on some 
resource. The root of the difficulty, already mentioned during the presentation of 
thunks (Sect. 7.1), is that one thread can fail to make progress, due to interfer- 
ence with another thread. A retry is then necessary, wasting time. In a spin lock, 
for instance, the “compare-and-set” (CAS) instruction that attempts to acquire 
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the lock can fail. There is no bound on the number of attempts that are required 
until the lock is eventually acquired. Thus, in Iris*, we are currently unable to 
assign any specification to the lock method of a spin lock. 

In the future, we wish to take inspiration from Hoffmann, Marmar and 
Shao [9], who use time credits in Concurrent Separation Logic to establish the 
lock-freedom of several concurrent data structures. The key idea is to formalize 
the informal argument that “failure of a thread to make progress is caused by 
successful progress in another thread”. Hoffmann et al. set up a “quantitative 
compensation scheme”, that is, a protocol by which successful progress in one 
thread (say, a successful CAS operation) must transmit a number of time credits 
to every thread that has encountered a corresponding failure and therefore must 
retry. Quite interestingly, this protocol is not hardwired into the reasoning rule 
for CAS. In fact, CAS itself is not primitive; it is encoded in terms of an atomic 
{ ...} construct. The protocol is set up by the user, by exploiting the basic tools 
of Concurrent Separation Logic, including shared invariants. Thus, it should be 
possible in Iris* to reproduce Hoffmann et al.’s reasoning and to assign useful 
specifications to certain lock-free data structures. Furthermore, we believe that, 
under a fairness assumption, it should be possible to assign Iris specifications 
also to coarse-grained data structures, which involve locks. Roughly speaking, 
under a fair scheduler, the maximum time spent waiting for a lock is the max- 
imum number of threads that may compete for this lock, multiplied by the 
maximum cost of a critical section protected by this lock. Whether and how this 
can be formalized is a topic of future research. 

The axiom X N -+ False comes with a few caveats that should be mentioned. 
The same caveats apply to Clochard et al.’s system [5], and are known to them. 

One caveat is that it is possible in theory to use this axiom to write and justify 
surprising programs. For instance, in Iris¥, the loop “for i = 1 to N do () done” 
satisfies the specification {True} — {False}: that is, it is possible to prove that this 
loop “never ends”. As a consequence, this loop also satisfies every specification 
of the form {True} — {®}. On the face of it, this loop would appear to be a 
valid solution to every programming assignment! In practice, it is up to the user 
to exhibit taste and to refrain from exploiting such a paradox. In reality, the 
situation is no worse than that in plain Iris, a logic of partial correctness, where 
the infinite loop “while true do () done” also satisfies {True} — {False}. 

Another important caveat is that the compiler must in principle be instructed 
to never optimize ticks away. If, for instance, the compiler was allowed to recog- 
nize that the loop “for i = 1 to N do () done” does nothing, and to replace this 
loop with a no-op, then this loop, which according to Iris¥ “never ends”, would 
in reality end immediately. We would thereby be in danger of proving that a 
source program cannot crash unless it is allowed to run for centuries, whereas 
in reality the corresponding compiled program does crash in a short time. In 
practice, this danger can be avoided by actually instrumenting the source code 
with tick() instructions and by presenting tick to the compiler as an unknown 
external function, which cannot be optimized away. However, this seems a pity, 
as it disables many compiler optimizations. 
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We believe that, despite these pitfalls, time receipts can be a useful tool. We 
hope that, in the future, better ways of avoiding these pitfalls will be discovered. 


10 Related Work 


Time credits in an affine Separation Logic are not a new concept. Atkey [1] 
introduces them in the setting of Separation Logic. Pilkiewicz and Pottier [20] 
exploit them in an informal reconstruction of Danielsson’s type discipline for 
lazy thunks [6], which itself is inspired by Okasaki’s work [19]. Several authors 
subsequently exploit time credits in machine-checked proofs of correctness and 
time complexity of algorithms and data structures [4,7,22]. Hoffmann, Marmar 
and Shao [9], whose work was discussed earlier in this paper (Sect. 9), use time 
credits in Concurrent Separation Logic to prove that several concurrent data 
structure implementations are lock-free. 

At a metatheoretic level, Charguéraud and Pottier [4] provide a machine- 
checked proof of soundness of a Separation Logic with time credits. Haslbeck 
and Nipkow [8] compare three program logics that can provide worst-case time 
complexity guarantees, including Separation Logic with time credits. 

To the best of our knowledge, affine (exclusive and persistent) time receipts 
are new, and the axiom X N -+ False is new as well. It is inspired by Clochard 
et al.’s idea that “programs cannot run for centuries” [5], but distills this idea 
into a simpler form. 

Our implementation of thunks and our reconstruction of Okasaki’s debits [19] 
in terms of credits are inspired by earlier work [6,20]. Although Okasaki’s analysis 
assumes a sequential setting, we adapt it to a concurrent setting by explicitly 
forbidding concurrent operations on thunks; to do so, we rely on Iris nonatomic 
invariants. In contrast, Danielsson [6] views thunks as a primitive construct in an 
otherwise pure language. He equips the language with a type discipline, where 
the type Thunk, which is indexed with a debit, forms a monad, and he provides 
a direct proof of type soundness. The manner in which Danielsson inserts tick 
instructions into programs is a precursor of our tick translation; this idea can 
in fact be traced at least as far back as Moran and Sands [16]. Pilkiewicz and 
Pottier [20] sketch an encoding of debits in terms of credits. Because they work in 
a sequential setting, they are able to install a shared invariant by exploiting the 
anti-frame rule [21], whereas we use Iris’ nonatomic invariants for this purpose. 
The anti-frame rule does not rule out reentrancy, so they must detect it at 
runtime, whereas in our case both concurrency and reentrancy are ruled out by 
our use of nonatomic invariants. 

Madhavan et al. [15] present an automated system that infers and verifies 
resource bounds for higher-order functional programs with thunks (and, more 
generally, with memoization tables). They transform the source program to an 
instrumented form where the state is explicit and can be described by monotone 
assertions. For instance, it is possible to assert that a thunk has been forced 
already (which guarantees that forcing it again has constant cost). This seems 
analogous in Okasaki’s terminology to asserting that a thunk has zero debits, 
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also a monotone assertion. We presently do not know whether Madhavan et al.’s 
system could be encoded into a lower-level program logic such as Iris®; it would 
be interesting to find out. 


11 Conclusion 


We have presented two mechanisms, namely time credits and time receipts, by 
which Iris, a state-of-the-art concurrent program logic, can be extended with 
means of reasoning about time. We have established soundness theorems that 
state precisely what guarantees are offered by the extended program logics Iris, 
Irisž, and Iris*®. We have defined these new logics modularly, by composing Iris 
with a program transformation. The three proofs follow a similar pattern: the 
soundness theorem of Iris is composed with a simulation lemma about the tick 
translation. We have illustrated the power of the new logics by reconstructing 
Okasaki’s debit-based analysis of thunks, by reconstructing Clochard et al.’s 
technique for proving the absence of certain integer overflows, and by presenting 
an analysis of Union-Find that exploits both time credits and time receipts. 

One limitation of our work is that all of our metatheoretic results are specific 
to HeapLang, and would have to be reproduced, following the same pattern, if 
one wished to instantiate Iris*® for another programming language. It would be 
desirable to make our statements and proofs generic. In future work, we would 
also like to better understand what can be proved about the time complexity 
of concurrent programs that involve waiting. Can the time spent waiting be 
bounded? What specification can one give to a lock, or a thunk that is protected 
by a lock? A fairness hypothesis about the scheduler seems to be required, but it 
is not clear yet how to state and exploit such a hypothesis. Hoffmann, Marmar 
and Shao [9] have carried out pioneering work in this area, but have dealt only 
with lock-free data structures and only with situations where the number of 
competing threads is fixed. It would be interesting to transpose their work into 
Iris® and to develop it further. 
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Abstract. We introduce Meta-F*, a tactics and metaprogramming 
framework for the F* program verifier. The main novelty of Meta-F* is 
allowing the use of tactics and metaprogramming to discharge assertions 
not solvable by SMT, or to just simplify them into well-behaved SMT frag- 
ments. Plus, Meta-F™ can be used to generate verified code automatically. 

Meta-F” is implemented as an F* effect, which, given the powerful effect 
system of F*, heavily increases code reuse and even enables the lightweight 
verification of metaprograms. Metaprograms can be either interpreted, or 
compiled to efficient native code that can be dynamically loaded into the 
F* type-checker and can interoperate with interpreted code. Evaluation 
on realistic case studies shows that Meta-F* provides substantial gains in 
proof development, efficiency, and robustness. 


Keywords: Tactics - Metaprogramming - Program verification - 
Verification conditions - SMT solvers - Proof assistants 


1 Introduction 


Scripting proofs using tactics and metaprogramming has a long tradition in inter- 
active theorem provers (ITPs), starting with Milner’s Edinburgh LCF [37]. In 
this lineage, properties of pure programs are specified in expressive higher-order 
(and often dependently typed) logics, and proofs are conducted using various 
imperative programming languages, starting originally with ML. 

Along a different axis, program verifiers like Dafny [47], VCC [23], Why3 [33], 
and Liquid Haskell [59] target both pure and effectful programs, with side-effects 
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ranging from divergence to concurrency, but provide relatively weak logics for 
specification (e.g., first-order logic with a few selected theories like linear arith- 
metic). They work primarily by computing verification conditions (VCs) from 
programs, usually relying on annotations such as pre- and postconditions, and 
encoding them to automated theorem provers (ATPs) such as satisfiability mod- 
ulo theories (SMT) solvers, often providing excellent automation. 

These two sub-fields have influenced one another, though the situation is 
somewhat asymmetric. On the one hand, most interactive provers have gained 
support for exploiting SMT solvers or other ATPs, providing push-button 
automation for certain kinds of assertions [26,31,43,44,54]. On the other hand, 
recognizing the importance of interactive proofs, Why3 [33] interfaces with ITPs 
like Coq. However, working over proof obligations translated from Why3 requires 
users to be familiar not only with both these systems, but also with the specifics 
of the translation. And beyond Why3 and the tools based on it [25], no other 
SMT-based program verifiers have full-fledged support for interactive proving, 
leading to several downsides: 


Limits to expressiveness. The expressiveness of program verifiers can be lim- 
ited by the ATP used. When dealing with theories that are undecidable and 
difficult to automate (e.g., non-linear arithmetic or separation logic), proofs in 
ATP-based systems may become impossible or, at best, extremely tedious. 


Boilerplate. To work around this lack of automation, programmers have to 
construct detailed proofs by hand, often repeating many tedious yet error-prone 
steps, so as to provide hints to the underlying solver to discover the proof. 
In contrast, ITPs with metaprogramming facilities excel at expressing domain- 
specific automation to complete such tedious proofs. 


Implicit proof context. In most program verifiers, the logical context of a 
proof is implicit in the program text and depends on the control flow and the pre- 
and postconditions of preceding computations. Unlike in interactive proof assis- 
tants, programmers have no explicit access, neither visual nor programmatic, to 
this context, making proof structuring and exploration extremely difficult. 

In direct response to these drawbacks, we seek a system that successfully 
combines the convenience of an automated program verifier for the common case, 
while seamlessly transitioning to an interactive proving experience for those parts 
of a proof that are hard to automate. Towards this end, we propose Meta-F*, a 
tactics and metaprogramming framework for the F* [1,58] program verifier. 


Highlights and Contributions of Meta-F* 


F* has historically been more deeply rooted as an SMT-based program verifier. 
Until now, F* discharged VCs exclusively by calling an SMT solver (usually 
Z3 [28]), providing good automation for many common program verification 
tasks, but also exhibiting the drawbacks discussed above. 

Meta-F* is a framework that allows F* users to manipulate VCs using tactics. 
More generally, it supports metaprogramming, allowing programmers to script 
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the construction of programs, by manipulating their syntax and customizing the 
way they are type-checked. This allows programmers to (1) implement custom 
procedures for manipulating VCs; (2) eliminate boilerplate in proofs and pro- 
grams; and (3) to inspect the proof state visually and to manipulate it program- 
matically, addressing the drawbacks discussed above. SMT still plays a central 
role in Meta-F*: a typical usage involves implementing tactics to transform VCs, 
so as to bring them into theories well-supported by SMT, without needing to 
(re)implement full decision procedures. Further, the generality of Meta-F* allows 
implementing non-trivial language extensions (e.g., typeclass resolution) entirely 
as metaprogramming libraries, without changes to the F* type-checker. 
The technical contributions of our work include the following: 


“Meta-” is just an effect (Sect.3.1). Meta-F* is implemented using F*’s 
extensible effect system, which keeps programs and metaprograms properly iso- 
lated. Being first-class F* programs, metaprograms are typed, call-by-value, 
direct-style, higher-order functional programs, much like the original ML. Fur- 
ther, metaprograms can be themselves verified (to a degree, see Sect.3.4) and 
metaprogrammed. 


Reconciling tactics with VC generation (Sect. 4.2). In program verifiers 
the programmer often guides the solver towards the proof by supplying inter- 
mediate assertions. Meta-F* retains this style, but additionally allows assertions 
to be solved by tactics. To this end, a contribution of our work is extracting, 
from a VC, a proof state encompassing all relevant hypotheses, including those 
implicit in the program text. 


Executing metaprograms efficiently (Sect.5). Metaprograms are executed 
during type-checking. As a baseline, they can be interpreted using F*’s exist- 
ing (but slow) abstract machine for term normalization, or a faster normalizer 
based on normalization by evaluation (NbE) [10,16]. For much faster execution 
speed, metaprograms can also be run natively. This is achieved by combining 
the existing extraction mechanism of F* to OCaml with a new framework for 
safely extending the F* type-checker with such native code. 


Examples (Sect.2) and evaluation (Sect.6). We evaluate Meta-F* on sev- 
eral case studies. First, we present a functional correctness proof for the Poly1305 
message authentication code (MAC) [11], using a novel combination of proofs 
by reflection for dealing with non-linear arithmetic and SMT solving for lin- 
ear arithmetic. We measure a clear gain in proof robustness: SMT-only proofs 
succeed only rarely (for reasonable timeouts), whereas our tactict+SMT proof 
is concise, never fails, and is faster. Next, we demonstrate an improvement in 
expressiveness, by developing a small library for proofs of heap-manipulating 
programs in separation logic, which was previously out-of-scope for F*. Finally, 
we illustrate the ability to automatically construct verified effectful programs, by 
introducing a library for metaprogramming verified low-level parsers and serial- 
izers with applications to network programming, where verification is accelerated 
by processing the VC with tactics, and by programmatically tweaking the SMT 
context. 
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We conclude that tactics and metaprogramming can be prosperously com- 
bined with VC generation and SMT solving to build verified programs with 
better, more scalable, and more robust automation. 

The full version of this paper, including appendices, can be found online in 
https: //www.fstar-lang.org/papers/metafstar. 


2 Meta-F* by Example 


F* is a general-purpose programming language aimed at program verification. It 
puts together the automation of an SMT-backed deductive verification tool with 
the expressive power of a language with full-spectrum dependent types. Briefly, it 
is a functional, higher-order, effectful, dependently typed language, with syntax 
loosely based on OCaml. F* supports refinement types and Hoare-style specifi- 
cations, computing VCs of computations via a type-level weakest precondition 
(WP) calculus packed within Dijkstra monads [57]. F*’s effect system is also 
user-extensible [1]. Using it, one can model or embed imperative programming 
in styles ranging from ML to C [55] and assembly [35]. After verification, F* pro- 
grams can be extracted to efficient OCaml or F# code. A first-order fragment 
of F*, called Low*, can also be extracted to C via the KreMLin compiler [55]. 

This paper introduces Meta-F*, a metaprogramming framework for F* that 
allows users to safely customize and extend F* in many ways. For instance, Meta- 
F* can be used to preprocess or solve proof obligations; synthesize F* expressions; 
generate top-level definitions; and resolve implicit arguments in user-defined 
ways, enabling non-trivial extensions. This paper primarily discusses the first 
two features. Technically, none of these features deeply increase the expressive 
power of F*, since one could manually program in F* terms that can now be 
metaprogrammed. However, as we will see shortly, manually programming terms 
and their proofs can be so prohibitively costly as to be practically infeasible. 

Meta-F* is similar to other tactic frameworks, such as Coq’s [29] or 
Lean’s [30], in presenting a set of goals to the programmer, providing commands 
to break them down, allowing to inspect and build abstract syntax, etc. In this 
paper, we mostly detail the characteristics where Meta-F* differs from other 
engines. 

This section presents Meta-F* informally, displaying its usage through case 
studies. We present any necessary F* background as needed. 


2.1 Tactics for Individual Assertions and Partial Canonicalization 


Non-linear arithmetic reasoning is crucially needed for the verification of opti- 
mized, low-level cryptographic primitives [18,64], an important use case for F* 
[13] and other verification frameworks, including those that rely on SMT solv- 
ing alone (e.g., Dafny [47]) as well as those that rely exclusively on tactic-based 
proofs (e.g., FiatCrypto [32]). While both styles have demonstrated significant 
successes, we make a case for a middle ground, leveraging the SMT solver for 
the parts of a VC where it is effective, and using tactics only where it is not. 
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We focus on Poly1305 [11], a widely-used cryptographic MAC that computes 
a series of integer multiplications and additions modulo a large prime number 
p = 2!89_5, Implementations of the Poly1305 multiplication and mod operations 
are carefully hand-optimized to represent 130-bit numbers in terms of smaller 
32-bit or 64-bit registers, using clever tricks; proving their correctness requires 
reasoning about long sequences of additions and multiplications. 


Previously: Guiding SMT Solvers by Manually Applying Lemmas. 
Prior proofs of correctness of Poly1305 and other cryptographic primitives using 
SMT-based program verifiers, including F* [64] and Dafny [18], use a combi- 
nation of SMT automation and manual application of lemmas. On the plus 
side, SMT solvers are excellent at linear arithmetic, so these proofs delegate all 
associativity-commutativity (AC) reasoning about addition to SMT. Non-linear 
arithmetic in SMT solvers, even just AC-rewriting and distributivity, are, how- 
ever, inefficient and unreliable—so much so that the prior efforts above (and 
other works too [40,41]) simply turn off support for non-linear arithmetic in the 
solver, in order not to degrade verification performance across the board due to 
poor interaction of theories. Instead, users need to explicitly invoke lemmas. ! 

For instance, here is a statement and proof of a lemma about Poly1305 in F*. 
The property and its proof do not really matter; the lines marked “(«argh! x)” 
do. In this particular proof, working around the solver’s inability to effectively 
reason about non-linear arithmetic, the programmer has spelled out basic facts 
about distributivity of multiplication and addition, by calling the library lemma 
distributivity_add_right, in order to guide the solver towards the proof. (Below, p44 
and p88 represent 244 and 288 respectively) 


let lemma_carry_limb_unrolled (a0 al a2 : nat) : Lemma (ensures ( 
a0 % p44 + p44 * ((al + a0 / p44) % p44) + p88 * (a2 + ((al + a0 / p44) / p44)) 
== a0 + p44 * al + p88 * a2)) = 

let z = a0 % p44 + p44 * ((al + a0 / p44) % p44) 

+ p88 * (a2 + ((al + a0 / p44) / p44)) in 
distributivity_add_right p88 a2 ((al + a0 / p44) / p44); (* argh! *) 
pow2_plus 44 44; 
lemma_div_mod (al + a0 / p44) p44; 
distributivity_add_right p44 ((al + a0 / p44) % p44) 

(p44 * ((al + a0 / p44) / p44)); (* argh! *) 
assert (p44 * ((al + a0 / p44) % p44) + p88 * ((al + a0 / p44) / p44) 

== p44 * (al + a0 / p44) ); 
distributivity_add_right p44 al (a0 / p44); (* argh! *) 
lemma_div_mod a0 p44 


Even at this relatively small scale, needing to explicitly instantiate the distribu- 
tivity lemma is verbose and error prone. Even worse, the user is blind while 
doing so: the program text does not display the current set of available facts nor 


1 Lemma (requires pre) (ensures post) is F* notation for the type of a computation 
proving pre => post—we omit pre when it is trivial. In F*’s standard library, math 
lemmas are proved using SMT with little or no interactions between problematic 
theory combinations. These lemmas can then be explicitly invoked in larger contexts, 
and are deleted during extraction. 
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the final goal. Proofs at this level of abstraction are painfully detailed in some 
aspects, yet also heavily reliant on the SMT solver to fill in the aspects of the 
proof that are missing. 

Given enough time, the solver can sometimes find a proof without the addi- 
tional hints, but this is usually rare and dependent on context, and almost never 
robust. In this particular example we find by varying Z3’s random seed that, in 
an isolated setting, the lemma is proven automatically about 32% of the time. 
The numbers are much worse for more complex proofs, and where the context 
contains many facts, making this style quickly spiral out of control. For example, 
a proof of one of the main lemmas in Poly1305, poly_multiply, requires 41 steps of 
rewriting for associativity-commutativity of multiplication, and distributivity of 
addition and multiplication—making the proof much too long to show here. 


SMT and Tactics in Meta-F*. The listing below shows the statement and 
proof of poly_multiply in Meta-F*, of which the lemma above was previously only 
a small part. Again, the specific property proven is not particularly relevant to 
our discussion. But, this time, the proof contains just two steps. 


let poly_multiply (n p r h rO r1 hO h1 h2 s1 dO d1 d2 h1 h2 hh : int) : Lemma 
(requires p> 0 Arl >OAnN>OA4* (n* n) =p+5Ar=ri*n+r0A 
h == h2 * (n * n) + hl * n + hO Asl = ri + (rl / 4) Arl %Z4 == 0A 
dO == hO * rO + hl * sl Adi == hO * rl + hl * rO + h2 * si A 
d2 == h2 * r0 Ahh == d2 * (n * n) + dl * n + dO) 
(ensures (h * r) % p == hh % p) = 
let r14 = rl / 4 in 
let h_r_expand = (h2 * (n * n) + hl * n + hO) * ((r14 * 4) * n + rO) in 
let hh_expand = (h2 * rO) * (n * n) + (hO * (r14 * 4) + hl * rO 
+ h2 * (5 * r14)) * n + (hO * rO + h1 * (5 * r14)) in 
let b = (h2 * n + h1) * r14 in 
modulo_addition_lemma hh_expand p b; 
assert (h_r_expand == hh_expand + b * (n * n * 4 + (—5))) 
by (canon-semiring int-csr) (* Proof of this step by Meta-F* tactic *) 


First, we call a single lemma about modular addition from F*’s standard 
library. Then, we assert an equality annotated with a tactic (assert..by). Instead 
of encoding the assertion as-is to the SMT solver, it is preprocessed by the 
canon-semiring tactic. The tactic is presented with the asserted equality as its 
goal, in an environment containing not only all variables in scope but also 
hypotheses for the precondition of poly-multiply and the postcondition of the 
modulo_addition_lemma call (otherwise, the assertion could not be proven). The 
tactic will then canonicalize the sides of the equality, but notably only “up to” 
linear arithmetic conversions. Rather than fully canonicalizing the terms, the 
tactic just rewrites them into a sum-of-products canonical form, leaving all the 
remaining work to the SMT solver, which can then easily and robustly discharge 
the goal using linear arithmetic only. 

This tactic works over terms in the commutative semiring of integers (int_csr) 
using proof-by-reflection [12,20,36,38]. Internally, it is composed of a simpler, 
also proof-by-reflection based tactic canon_monoid that works over monoids, which 
is then “stacked” on itself to build canon_semiring. The basic idea of proof-by- 
reflection is to reduce most of the proof burden to mechanical computation, 
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obtaining much more efficient proofs compared to repeatedly applying lemmas. 
For canon_monoid, we begin with a type for monoids, a small AST representing 
monoid values, and a denotation for expressions back into the monoid type. 


type monoid (a:Type) = { unit : a; mult : (a >a — a); (* + monoid laws ... x) } 
type exp (a:Type) = | Unit : exp a | Var : a — exp a | Mult : exp a — exp a > exp a 
(« Note on syntax: #a below denotes that a is an implicit argument x) 
let rec denote (#4a: Type) (m:monoid a) (e:exp a) : a = 

match e with 

| Unit — m.unit | Var x +x | Mult x y — m.mult (denote m x) (denote m y) 


To canonicalize an exp, it is first converted to a list of operands (flatten) and then 
reflected back to the monoid (mldenote). The process is proven correct, in the 
particular case of equalities, by the monoid-_reflect lemma. 


val flatten : #a:Type — exp a — list a 
val mldenote : #4a: Type — monoid a — list a — a 
let monoid_reflect (#¢a:Type) (m:monoid a) (e1 e2 : exp a) 
: Lemma (requires (mldenote m (flatten e1) == mldenote m (flatten e2))) 
(ensures (denote m e; == denote m e2)) = ... 


At this stage, if the goal is t1== t2, we require two monoidal expressions e1 
and e2 such that tı== denote m e; and t2== denote m e2. They are constructed 
by the tactic canon_monoid by inspecting the syntax of the goal, using Meta-F*’s 
reflection capabilities (detailed ahead in Sect. 3.3). We have no way to prove once 
and for all that the expressions built by canon_monoid correctly denote the terms, 
but this fact can be proven automatically at each application of the tactic, by 
simple unification. The tactic then applies the lemma monoid_reflect m e,e2, and 
the goal is changed to mldenote m (flatten e1) == mldenote m (flatten e2). Finally, 
by normalization, each side will be canonicalized by running flatten and mldenote. 
The canon_semiring tactic follows a similar approach, and is similar to existing 
reflective tactics for other proof assistants [9,38], except that it only canonicalizes 
up to linear arithmetic, as explained above. The full VC for poly_multiply contains 
many other facts, e.g., that p is non-zero so the division is well-defined and that 
the postcondition does indeed hold. These obligations remain in a “skeleton” VC 
that is also easily proven by Z3. This proof is much easier for the programmer 
to write and much more robust, as detailed ahead in Sect.6.1. The proof of 
Poly1305’s other main lemma, poly_reduce, is also similarly well automated. 


Tactic Proofs Without SMT. Of course, one can verify poly_multiply in Coq, 
following the same conceptual proof used in Meta-F*, but relying on tactics only. 
Our proof (included in the appendix) is 27 lines long, two of which involve the 
use of Coq’s ring tactic (similar to our canon_semiring tactic) and omega tactic for 
solving formulas in Presburger arithmetic. The remaining 25 lines include steps 
to destruct the propositional structure of terms, rewrite by equalities, enriching 
the context to enable automatic modulo rewriting (Coq does not fully automat- 
ically recognize equality modulo p as an equivalence relation compatible with 
arithmetic operators). While a mature proof assistant like Coq has libraries and 
tools to ease this kind of manipulation, it can still be verbose. 
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In contrast, in Meta-F* all of these mundane parts of a proof are simply 
dispatched to the SMT solver, which decides linear arithmetic efficiently, beyond 
the quantifier-free Presburger fragment supported by tactics like omega, handles 
congruence closure natively, etc. 


2.2 Tactics for Entire VCs and Separation Logic 


A different way to invoke Meta-F* is over an entire VC. While the exact shape 
of VCs is hard to predict, users with some experience can write tactics that find 
and solve particular sub-assertions within a VC, or simply massage them into 
shapes better suited for the SMT solver. We illustrate the idea on proofs for 
heap-manipulating programs. 

One verification method that has eluded F* until now is separation logic, 
the main reason being that the pervasive “frame rule” requires instantiating 
existentially quantified heap variables, which is a challenge for SMT solvers, and 
simply too tedious for users. With Meta-F*, one can do better. We have written 
a (proof-of-concept) embedding of separation logic and a tactic (sl_auto) that 
performs heap frame inference automatically. 

The approach we follow consists of designing the WP specifications for prim- 
itive stateful actions so as to make their footprint syntactically evident. The 
tactic then descends through VCs until it finds an existential for heaps arising 
from the frame rule. Then, by solving an equality between heap expressions 
(which requires canonicalization, for which we use a variant of canon_monoid 
targeting commutative monoids) the tactic finds the frames and instantiates 
the existentials. Notably, as opposed to other tactic frameworks for separation 
logic [4,45, 49,51], this is all our tactic does before dispatching to the SMT solver, 
which can now be effective over the instantiated VC. 

We now provide some detail on the framework. Below, ‘emp’ represents the 
empty heap, ‘e’ is the separating conjunction and ‘r++ v’ is the heaplet with 
the single reference r set to value v.? Our development distinguishes between 
a “heap” and its “memory” for technical reasons, but we will treat the two as 
equivalent here. Further, defined is a predicate discriminating valid heaps (as 
in [52]), i.e., those built from separating conjunctions of actually disjoint heaps. 

We first define the type of WPs and present the WP for the frame rule: 


let pre = memory — prop (* predicate on initial heaps *) 
let post a = a — memory — prop (* predicate on result values and final heaps *) 
let wp a = post a — pre (x transformer from postconditions to preconditions *) 


let frame_post (#4a:Type) (p:post a) (mo:memory) : post a = 
Ax mı — defined (mo è mi) A p x (mo @ m1) 
let frame_wp (#4a:Type) (wp:wp a) (post:post a) (m:memory) = 
mo m1. defined (mo è m1) A m == (mo è m1) A wp (frame_post post m1) mo 


lw 


? This differs from the usual presentation where these three operators are heap predi- 
cates instead of heaps. 
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Intuitively, frame_post p mo behaves as the postcondition p “framed” by mo, i.e., 
frame_post p mo x mı holds when the two heaps mo and mj are disjoint and p 
holds over the result value x and the conjoined heaps. Then, frame_wp wp takes a 
postcondition p and initial heap m, and requires that m can be split into disjoint 
subheaps mo (the footprint) and mı (the frame), such that the postcondition p, 
when properly framed, holds over the footprint. 

In order to provide specifications for primitive actions we start in small- 
footprint style. For instance, below is the WP for reading a reference: 


let read_wp (#£a:Type) (r:ref a) = Apost mo — 3x. mo == r+ x A post x mo 


We then insert framing wrappers around such small-footprint WPs when expos- 
ing the corresponding stateful actions to the programmer, e.g., 


val (!) : #a:Type — r:ref a > STATE a (A p m — frame_wp (read_wp r) p m) 


To verify code written in such style, we annotate the corresponding programs to 
have their VCs processed by sl_auto. For instance, for the swap function below, the 
tactic successfully finds the frames for the four occurrences of the frame rule and 
greatly reduces the solver’s work. Even in this simple example, not performing 
such instantiation would cause the solver to fail. 


let swap_wp (rı r2 : ref int) = 
àp m >axy.m == (ne xe rn m= y) Ap () ( 

let swap (rı r2 : ref int) : ST unit (swap_wp rı r2) by (sl_auto ()) = 
let x = !rı in let y = !r2 in rı := y; r2 := x 


rime yer e x) 


The sl_auto tactic: (1) uses syntax inspection to unfold and traverse the goal 
until it reaches a frame_.wp—say, the one for !r2; (2) inspects frame_wp’s first 
explicit argument (here read_wp r2) to compute the references the current com- 
mand requires (here r2); (3) uses unification variables to build a memory expres- 
sion describing the required framing of input memory (here rz +> ?u; è ?u2) and 
instantiates the existentials of frame_wp with these unification variables; (4) builds 
a goal that equates this memory expression with frame_wp’s third argument (here 
rp +X er + y); and (5) uses a commutative monoids tactic (similar to Sect. 2.1) 
with the heap algebra (emp, e) to canonicalize the equality and sort the heaplets. 
Next, it can solve for the unification variables component-wise, instantiating ?u1 
to y and ?uz to r++ x, and then proceed to the next frame_wp. 

In general, after frames are instantiated, the SMT solver can efficiently prove 
the remaining assertions, such as the obligations about heap definedness. Thus, 
with relatively little effort, Meta-F* brings an (albeit simple version of a) widely 
used yet previously out-of-scope program logic (i.e., separation logic) into F*. 
To the best of our knowledge, the ability to script separation logic into an SMT- 
based program verifier, without any primitive support, is unique. 


2.3 Metaprogramming Verified Low-Level Parsers and Serializers 


Above, we used Meta-F* to manipulate VCs for user-written code. Here, we focus 
instead on generating verified code automatically. We loosely refer to the previ- 
ous setting as using “tactics”, and to the current one as “metaprogramming”. 
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In most ITPs, tactics and metaprogramming are not distinguished; however in a 
program verifier like F*, where some proofs are not materialized at all (Sect. 4.1), 
proving VCs of existing terms is distinct from generating new terms. 

Metaprogramming in F* involves programmatically generating a (potentially 
effectful) term (e.g., by constructing its syntax and instructing F* how to type- 
check it) and processing any VCs that arise via tactics. When applicable (e.g., 
when working in a domain-specific language), metaprogramming verified code 
can substantially reduce, or even eliminate, the burden of manual proofs. 

We illustrate this by automating the generation of parsers and serializers 
from a type definition. Of course, this is a routine task in many mainstream 
metaprogramming frameworks (e.g., Template Haskell, camlp4, etc). The novelty 
here is that we produce imperative parsers and serializers extracted to C, with 
proofs that they are memory safe, functionally correct, and mutually inverse. 
This section is slightly simplified, more detail can be found the appendix. 

We proceed in several stages. First, we program a library of pure, high-level 
parser and serializer combinators, proven to be (partial) mutual inverses of each 
other. A parser for a type t is represented as a function possibly returning a t 
along with the amount of input bytes consumed. The type of a serializer for a 
given p:parser t contains a refinement? stating that p is an inverse of the serializer. 
A package is a dependent record of a parser and an associated serializer. 


let parser t = seq byte — option (t * nat) 
let serializer #t (p:parser t) = f:(t — seq byte){V x. p (f x) == Some (x, length (f x))} 
type package t = { p: parser t ; s : serializer p } 


Basic combinators in the library include constructs for parsing and serializing 
base values and pairs, such as the following: 


val p_u8 : parse u8 

val s_u8 : serializer p_u8 

val p_pair : parser tl — parser t2 — parser (t1 * t2) 

val s_pair : serializer pl — serializer p2 — serializer (p_pair p1 p2) 


Next, we define low-level versions of these combinators, which work over muta- 
ble arrays instead of byte sequences. These combinators are coded in the Low* 
subset of F* (and so can be extracted to C) and are proven to both be 
memory-safe and respect their high-level variants. The type for low-level parsers, 
parser_impl (p:parser t), denotes an imperative function that reads from an array 
of bytes and returns a t, behaving as the specificational parser p. Conversely, a 
serializer_impl (s:serializer p) writes into an array of bytes, behaving as s. 

Given such a library, we would like to build verified, mutually inverse, low- 
level parsers and serializers for specific data formats. The task is mechanical, 
yet overwhelmingly tedious by hand, with many auxiliary proof obligations of a 
predictable structure: a perfect candidate for metaprogramming. 


Deriving Specifications from a Type Definition. Consider the following F* type, 
representing lists of exactly 18 pairs of bytes. 


3 F* syntax for refinements is x:t {¢}, denoting the type of all x of type t satisfying ¢. 
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type sample = nlist 18 (u8 » u8) 


The first component of our metaprogram is gen_specs, which generates parser 
and serializer specifications from a type definition. 


let ps_sample : package sample = - by (gen_specs (*sample)) 


The syntax _ by 7 is the way to call Meta-F* for code generation. Meta-F* will 
run the metaprogram 7 and, if successful, replace the underscore by the result. In 
this case, the gen_specs (*sample) inspects the syntax of the sample type (Sect. 3.3) 
and produces the package below (seq_p and seq-s are sequencing combinators): 


let ps_ssample = { p = p_nlist 18 (p_u8 ‘seq_p* p_u8) 
; s = s_nlist 18 (s_u8 ‘seq_s* s_u8) } 


Deriving Low-Level Implementations that Match Specifications. From this pair 
of specifications, we can automatically generate Low* implementations for them: 


let p_low : parser_impl ps-sample.p = - by gen_parser_impl 
let s_low : serializer_impl ps_sample.s = _ by gen-_serializer_impl 


which will produce the following low-level implementations: 


let plow = parse_nlist_impl 18ul (parse_u8_impl ‘seq_pi* parse_u8_impl) 
let slow = serialize_nlist_impl 18ul (serialize_u8_impl *seq_si*> serialize_u8_impl) 


For simple types like the one above, the generated code is fairly simple. However, 
for more complex types, using the combinator library comes with non-trivial 
proof obligations. For example, even for a simple enumeration, type color = Red 
| Green, the parser specification is as follows: 


parse_synth (parse_bounded_u8 2) 
(A x2 — mk_ift (x2 = Quy) (A - — Red) (A - — Green)) 
(A x — match x with | Green — luy | Red — Ouy) 


We represent Red with Ouy and Green with luy. The parser first parses a 
“bounded” byte, with only two values. The parse_synth combinator then expects 
functions between the bounded byte and the datatype being parsed (color), which 
must be proven to be mutual inverses. This proof is conceptually easy, but for 
large enumerations nested deep within the structure of other types, it is notori- 
ously hard for SMT solvers. Since the proof is inherently computational, a proof 
that destructs the inductive type into its cases and then normalizes is much more 
natural. With our metaprogram, we can produce the term and then discharge 
these proof obligations with a tactic on the spot, eliminating them from the final 
VC. We also explore simply tweaking the SMT context, again via a tactic, with 
good results. A quantitative evaluation is provided in Sect. 6.2. 


3 The Design of Meta-F* 


Having caught a glimpse of the use cases for Meta-F*, we now turn to its design. 
As usual in proof assistants (such as Coq, Lean and Idris), Meta-F* tactics work 
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over a set of goals and apply primitive actions to transform them, possibly solving 
some goals and generating new goals in the process. Since this is standard, we 
will focus the most on describing the aspects where Meta-F* differs from other 
engines. We first describe how metaprograms are modelled as an effect (Sect. 3.1) 
and their runtime model (Sect. 3.2). We then detail some of Meta-F*’s syntax 
inspection and building capabilities (Sect. 3.3). Finally, we show how to perform 
some (lightweight) verification of metaprograms (Sect. 3.4) within F*. 


3.1 An Effect for Metaprogramming 


‘ 


Meta-F* tactics are, at their core, programs that transform the “proof state”, 
i.e. a set of goals needing to be solved. As in Lean [30] and Idris [22], we define a 
monad combining exceptions and stateful computations over a proof state, along 
with actions that can access internal components such as the type-checker. For 
this we first introduce abstract types for the proof state, goals, terms, environ- 
ments, etc., together with functions to access them, some of them shown below. 


type proofstate val goals_of : proofstate — list goal 
type goal val goal_env : goal — env 

type term val goal_type : goal — term 

type env val goal_solution : goal — term 


We can now define our metaprogramming monad: tac. It combines F*’s existing 
effect for potential divergence (Div), with exceptions and stateful computations 
over a proofstate. The definition of tac, shown below, is straightforward and given 
in F*’s standard library. Then, we use F*’s effect extension capabilities [1] in 
order to elevate the tac monad and its actions to an effect, dubbed TAC. 


type error = exn x proofstate (x error and proofstate at the time of failure x) 
type result a = | Success : a — proofstate — result a | Failed : error — result a 
let tac a = proofstate — Div (result a) 
let t_return #a (x:a) = Aps — Success x ps 
let t_bind #a #b (m:tac a) (f:a > tac b) : tac b = Aps >... (x omitted, yet simple x) 
let get () : tac proofstate = Aps — Success ps ps 
let raise #a (e:exn) : tac a = Aps — Failed (e, ps) 
new_effect { TAC with repr = tac ; return = t_return ; bind = t_bind 
; get = get ; raise = raise } 


The new-effect declaration introduces computation types of the form TAC t wp, 
where t is the return type and wp a specification. However, until Sect. 3.4 we shall 
only use the derived form Tac t, where the specification is trivial. These com- 
putation types are distinct from their underlying monadic representation type 
tac t—users cannot directly access the proof state except via the actions. The 
simplest actions stem from the tac monad definition: get : unit — Tac proofstate 
returns the current proof state and raise: exn — Tac a fails with the given excep- 
tion*. Failures can be handled using catch : (unit — Tac a) — Tac (either exn a), 
which resets the state on failure, including that of unification metavariables. 


4 We use greek letters a, 3, ... to abbreviate universally quantified type variables. 
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We emphasize two points here. First, there is no “set” action. This is to for- 
bid metaprograms from arbitrarily replacing their proof state, which would be 
unsound. Second, the argument to catch must be thunked, since in F* impure 
un-suspended computations are evaluated before they are passed into functions. 
The only aspect differentiating Tac from other user-defined effects is the exis- 
tence of effect-specific primitive actions, which give access to the metaprogram- 
ming engine proper. We list here but a few: 


val trivial : unit — Tac unit val tc : term — Tac term val dump : string — Tac unit 


All of these are given an interpretation internally by Meta-F*. For instance, trivial 
calls into F*’s logical simplifier to check whether the current goal is a trivial 
proposition and discharges it if so, failing otherwise. The tc primitive queries the 
type-checker to infer the type of a given term in the current environment (F* 
types are a kind of terms, hence the codomain of tc is also term). This does not 
change the proof state; its only purpose is to return useful information to the 
calling metaprograms. Finally, dump outputs the current proof state to the user 
in a pretty-printed format, in support of user interaction. 

Having introduced the Tac effect and some basic actions, writing metapro- 
grams is as straightforward as writing any other F* code. For instance, here are 
two metaprogram combinators. The first one repeatedly calls its argument until 
it fails, returning a list of all the successfully-returned values. The second one 
behaves similarly, but folds the results with some provided folding function. 


let rec repeat (7 : unit — Tac a) : Tac (list a) = 
match catch 7 with | Inl _ > [| | Inr x +x :: repeat 7 


let repeat_fold f e 7 = fold_left f e (repeat 7 ) 


These two small combinators illustrate a few key points of Meta-F*. As for all 
other F* effects, metaprograms are written in applicative style, without explicit 
return, bind, or lift of computations (which are inserted under the hood). This 
also works across different effects: repeat_fold can seamlessly combine the pure 
fold_left from F*’s list library with a metaprogram like repeat. Metaprograms are 
also type- and effect-inferred: while repeat_fold was not at all annotated, F* infers 
the polymorphic type (8— a— 3) — b— (unit — Tac a) — Tac a for it. 

It should be noted that, if lacking an effect extension feature, one could 
embed metaprograms simply via the (properly abstracted) tac monad instead of 
the Tac effect. It is just more convenient to use an effect, given we are working 
within an effectful program verifier already. In what follows, with the exception 
of Sect.3.4 where we describe specifications for metaprograms, there is little 
reliance on using an effect; so, the same ideas could be applied in other settings. 


3.2 Executing Meta-F* Metaprograms 


Running metaprograms involves three steps. First, they are reified [1] into their 
underlying tac representation, i.e. as state-passing functions. User code cannot 
reify metaprograms: only F* can do so when about to process a goal. 
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Second, the reified term is applied to an initial proof state, and then simply 
evaluated according to F*’s dynamic semantics, for instance using F*’s existing 
normalizer. For intensive applications, such as proofs by reflection, we provide 
faster alternatives (Sect. 5). In order to perform this second step, the proof state, 
which up until this moments exists only internally to F*, must be embedded as 
a term, i.e., as abstract syntax. Here is where its abstraction pays off: since 
metaprograms cannot interact with a proof state except through a limited inter- 
face, it need not be deeply embedded as syntax. By simply wrapping the internal 
proofstate into a new kind of “alien” term, and making the primitives aware of 
this wrapping, we can readily run the metaprogram that safely carries its alien 
proof state around. This wrapping of proof states is a constant-time operation. 

The third step is interpreting the primitives. They are realized by functions 
of similar types implemented within the F* type-checker, but over an internal 
tac monad and the concrete definitions for term, proofstate, etc. Hence, there is 
a translation involved on every call and return, switching between embedded 
representations and their concrete variants. Take dump, for example, with type 
string — Tac unit. Its internal implementation, implemented within the F* type- 
checker, has type string — proofstate — Div (result unit). When interpreting a call 
to it, the interpreter must unembed the arguments (which are representations of 
F* terms) into a concrete string and a concrete proofstate to pass to the internal 
implementation of dump. The situation is symmetric for the return value of the 
call, which must be embedded as a term. 


3.3 Syntax Inspection, Generation, and Quotation 


If metaprograms are to be reusable over different kinds of goals, they must be 
able to reflect on the goals they are invoked to solve. Like any metaprogramming 
system, Meta-F* offers a way to inspect and construct the syntax of F* terms. 
Our representation of terms as an inductive type, and the variants of quotations, 
are inspired by the ones in Idris [22] and Lean [30]. 


Inspecting Syntax. Internally, F* uses a locally-nameless representation [21] 
with explicit, delayed substitutions. To shield metaprograms from some of this 
internal bureaucracy, we expose a simplified view [61] of terms. Below we present 
a few constructors from the term_view type: 
val inspect : term — Tac term_view type term_view = 
val pack : term_view — term | Tv_BVar : v:dbvar — term_view 

| Tv_Var : v:iname — term_view 

| Tv_FVar : v:qname — term_view 

| Tv_Abs : bv:binder — body:term — term_view 

| Tv_App : hd:term — arg:term — term_view 


The term_view type provides the “one-level-deep” structure of a term: metapro- 
grams must call inspect to reveal the structure of the term, one constructor at a 
time. The view exposes three kinds of variables: bound variables, Tv_BVar; named 
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local variables Tv_Var; and top-level fully qualified names, Tv_FVar. Bound vari- 
ables and local variables are distinguished since the internal abstract syntax 
is locally nameless. For metaprogramming, it is usually simpler to use a fully- 
named representation, so we provide inspect and pack functions that open and 
close binders appropriately to maintain this invariant. Since opening binders 
requires freshness, inspect has effect Tac.” As generating large pieces of syntax 
via the view easily becomes tedious, we also provide some ways of quoting terms: 


Static Quotations. A static quotation `e is just a shorthand for statically 
calling the F* parser to convert e into the abstract syntax of F* terms above. 
For instance, `(f 1 2) is equivalent to the following, 


pack (Tv_App (pack (Tv_App (pack (Tv_FVar "£")) 
(pack (Tv_Const (C_Int 1))))) 
(pack (Tv_Const (C_Int 2)))) 


Dynamic Quotations. A second form of quotation is dquote: #a:Type —a —> 
Tac term, an effectful operation that is interpreted by F*’s normalizer during 
metaprogram evaluation. It returns the syntax of its argument at the time 
dquote e is evaluated. Evaluating dquote e substitutes all the free variables in 
e with their current values in the execution environment, suspends further eval- 
uation, and returns the abstract syntax of the resulting term. For instance, 
evaluating (Ax — dquote (x + 1)) 16 produces the abstract syntax of 16 + 1. 


Anti-quotations. Static quotations are useful for building big chunks of syntax 
concisely, but they are of limited use if we cannot combine them with existing bits 
of syntax. Subterms of a quotation are allowed to “escape” and be substituted by 
arbitrary expressions. We use the syntax `#t to denote an antiquoted t, where t 
must be an expression of type term in order for the quotation to be well-typed. 
For example, ‘(1 +` #e) creates syntax for an addition where one operand is the 
integer constant 1 and the other is the term represented by e. 


Unquotation. Finally, we provide an effectful operation, unquote: #a:Type —> 
t:term — Tac a, which takes a term representation t and an expected type for it a 
(usually inferred from the context), and calls the F* type-checker to check and 
elaborate the term representation into a well-typed term. 


3.4 Specifying and Verifying Metaprograms 


Since we model metaprograms as a particular kind of effectful program within 
F*, which is a program verifier, a natural question to ask is whether F* can 
specify and verify metaprograms. The answer is “yes, to a degree”. 

To do so, we must use the WP calculus for the TAC effect: TAC-computations 
are given computation types of the form TAC a wp, where a is the computa- 
tion’s result type and wp is a weakest-precondition transformer of type tacwp a 
= proofstate — (result a — prop) — prop. However, since WPs tend to not be very 


5 We also provide functions inspect_In, pack_In which stay in a locally-nameless repre- 
sentation and are thus pure, total functions. 
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intuitive, we first define two variants of the TAC effect: TacH in “Hoare-style” with 
pre- and postconditions and Tac (which we have seen before), which only spec- 
ifies the return type, but uses trivial pre- and postconditions. The requires and 
ensures keywords below simply aid readability of pre- and postconditions—they 
are identity functions. 


effect TacH (a:Type) (pre : proofstate — prop) (post : proofstate — result a — prop) = 
TAC a (A ps post’ — pre ps A (Y r. post ps r ==> post’ r)) 
effect Tac (a:Type) = TacH a (requires (A - + T)) (ensures (A - -— T)) 


Previously, we only showed the simple type for the raise primitive, namely exn > 
Tac a. In fact, in full detail and Hoare style, its type/specification is: 


val raise : e:exn— TacH a (requires (A _ > T)) 
(ensures (A ps r +r == Failed (e, ps))) 


expressing that the primitive has no precondition, always fails with the provided 
exception, and does not modify the proof state. From the specifications of the 
primitives, and the automatically obtained Dijkstra monad, F* can already prove 
interesting properties about metaprograms. We show a few simple examples. 

The following metaprogram is accepted by F* as it can conclude, from the 
type of raise, that the assertion is unreachable, and hence raise_flow can have a 
trivial precondition (as Tac unit implies). 


let raise_flow () : Tac unit = raise SomeExn; assert L 


For cur_goal_safe below, F* verifies that (given the precondition) the pattern 
match is exhaustive. The postcondition is also asserting that the metaprogram 
always succeeds without affecting the proof state, returning some unspecified 
goal. Calls to cur_goal_safe must statically ensure that the goal list is not empty. 


let cur_goal_safe () : TacH goal (requires (A ps — —(goals_of ps == []))) 
(ensures (À ps r > dg. r == Success g ps)) = 
match goals_of (get ()) with | g:: ->g 


Finally, the divide combinator below “splits” the goals of a proof state in two at a 
given index n, and focuses a different metaprogram on each. It includes a runtime 
check that the given n is non-negative, and raises an exception in the TAC effect 
otherwise. Afterwards, the call to the (pure) List.splitAt function requires that 
n be statically known to be non-negative, a fact which can be proven from the 
specification for raise and the effect definition, which defines the control flow. 


let divide (n:int) (tl : unit — Tac a) (tr : unit — Tac B) : Tac (a * B) = 
if n < 0 then raise NegativeN; 
let gsl, gsr = List.splitAt n (goals ()) in ... 


This enables a style of “lightweight” verification of metaprograms, where expres- 
sive invariants about their state and control-flow can be encoded. The program- 
mer can exploit dynamic checks (n < 0) and exceptions (raise) or static ones 
(preconditions), or a mixture of them, as needed. 
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Due to type abstraction, though, the specifications of most primitives cannot 
provide complete detail about their behavior, and deeper specifications (such as 
ensuring a tactic will correctly solve a goal) cannot currently be proven, nor even 
stated—to do so would require, at least, an internalization of the typing judgment 
of F*. While this is an exciting possibility [3], we have for now only focused on 
verifying basic safety properties of metaprograms, which helps users detect errors 
early, and whose proofs the SMT can handle well. Although in principle, one can 
also write tactics to discharge the proof obligations of metaprograms. 


4 Meta-F*, Formally 


We now describe the trust assumptions for Meta-F* (Sect. 4.1) and then how we 
reconcile tactics within a program verifier, where the exact shape of VCs is not 
given, nor known a priori by the user (Sect. 4.2). 


4.1 Correctness and Trusted Computing Base (TCB) 


As in any proof assistant, tactics and metaprogramming would be rather useless 
if they allowed to “prove” invalid judgments—care must be taken to ensure 
soundness. We begin with a taste of the specifics of F*’s static semantics, which 
influence the trust model for Meta-F*, and then provide more detail on the TCB. 


Proof Irrelevance in F*. The following two rules for introducing and eliminat- 
ing refinement types are key in F*, as they form the basis of its proof irrelevance. 


T-REFINE V-REFINE 
Fre: t IPE ¢@e/z] Tre: «:t{d} 
Tre: «:t{d} TE dle/a] 


The F symbol represents F*’s validity judgment [1] which, at a high-level, 
defines a proof-irrelevant, classical, higher-order logic. These validity hypotheses 
are usually collected by the type-checker, and then encoded to the SMT solver 
in bulk. Crucially, the irrelevance of validity is what permits efficient interaction 
with SMT solvers, since reconstructing F* terms from SMT proofs is unneeded. 

As evidenced in the rules, validity and typing are mutually recursive, and 
therefore Meta-F* must also construct validity derivations. In the implementa- 
tion, we model these validity goals as holes with a “squash” type [5,53], where 
squash @= -:unit{d}, i.e., a refinement of unit. Concretely, we model T'E ¢ as 
I} ?u: squash ọ using a unification variable. Meta-F* does not construct deep 
solutions to squashed goals: if they are proven valid, the variable ?u is simply 
solved by the unit value ‘()’. At any point, any such irrelevant goal can be sent 
to the SMT solver. Relevant goals, on the other hand, cannot be sent to SMT. 


Scripting the Typing Judgment. A consequence of validity proofs not being 
materialized is that type-checking is undecidable in F*. For instance: does the 
unit value () solve the hole IF ?u: squash 6? Well, only if ø holds—a condi- 
tion which no type-checker can effectively decide. This implies that the type- 
checker cannot, in general, rely on proof terms to reconstruct a proof. Hence, the 
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primitives are designed to provide access to the typing judgment of F* directly, 
instead of building syntax for proof terms. One can think of F*’s type-checker 
as implementing one particular algorithmic heuristic of the typing and validity 
judgments—a heuristic which happens to work well in practice. For convenience, 
this default type-checking heuristic is also available to metaprograms: this is in 
fact precisely what the exact primitive does. Having programmatic access to 
the typing judgment also provides the flexibility to tweak VC generation as 
needed, instead of leaving it to the default behavior of F*. For instance, the 
refine_intro primitive implements T-REFINE. When applied, it produces two new 
goals, including that the refinement actually holds. At that point, a metapro- 
gram can run any arbitrary tactic on it, instead of letting the F* type-checker 
collect the obligation and send it to the SMT solver in bulk with others. 


Trust. There are two common approaches for the correctness of tactic engines: 
(1) the de Bruijn criterion [6], which requires constructing full proofs (or proof 
terms) and checking them at the end, hence reducing trust to an indepen- 
dent proof-checker; and (2) the LCF style, which applies backwards reasoning 
while constructing validation functions at every step, reducing trust to primitive, 
forward-style implementations of the system’s inference rules. 

As we wish to make use of SMT solvers within F*, the first approach is 
not easy. Reconstructing the proofs SMT solvers produce, if any, back into a 
proper derivation remains a significant challenge (even despite recent progress, 
e.g. [17,31]). Further, the logical encoding from F* to SMT, along with the 
solver itself, are already part of F*’s TCB: shielding Meta-F* from them would 
not significantly increase safety of the combined system. 

Instead, we roughly follow the LCF approach and implement F*’s typing 
rules as the basic user-facing metaprogramming actions. However, instead of 
implementing the rules in forward-style and using them to validate (untrusted) 
backwards-style tactics, we implement them directly in backwards-style. That is, 
they run by breaking down goals into subgoals, instead of combining proven facts 
into new proven facts. Using LCF style makes the primitives part of the TCB. 
However, given the primitives are sound, any combination of them also is, and 
any user-provided metaprogram must be safe due to the abstraction imposed by 
the Tac effect, as discussed next. 


Correct Evolutions of the Proof State. For soundness, it is imperative that 
tactics do not arbitrarily drop goals from the proof state, and only discharge 
them when they are solved, or when they can be solved by other goals tracked 
in the proof state. For a concrete example, consider the following program: 


let f : int — int = _ by (intro (); exact (*42)) 


Here, Meta-F* will create an initial proof state with a single goal of the form 
[Ø H ?u1 : int + int] and begin executing the metaprogram. When applying the 
intro primitive, the proof state transitions as shown below. 


[Ø H uy : int — int] ~> [x:int F ?us : int] 
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Here, a solution to the original goal has not yet been built, since it depends on 
the solution to the goal on the right hand side. When it is solved with, say, 42, 
we can solve our original goal with Ax — 42. To formalize these dependencies, we 
say that a proof state ọ correctly evolves (via f) to ~, denoted ¢ <f Y, when 
there is a generic transformation f, called a validation, from solutions to all of 
w’s goals into correct solutions for ¢’s goals. When ¢ has n goals and w has m 
goals, the validation f is a function from term” into term”. Validations may be 
composed, providing the transitivity of correct evolution, and if a proof state ¢ 
correctly evolves (in any amount of steps) into a state with no more goals, then 
we have fully defined solutions to all of ¢’s goals. We emphasize that validations 
are not constructed explicitly during the execution of metaprograms. Instead we 
exploit unification metavariables to instantiate the solutions automatically. 

Note that validations may construct solutions for more than one goal, i.e., 
their codomain is not a single term. This is required in Meta-F*, where primitive 
steps may not only decompose goals into subgoals, but actually combine goals 
as well. Currently, the only primitive providing this behavior is join, which finds 
a maximal common prefix of the environment of two irrelevant goals, reverts 
the “extra” binders in both goals and builds their conjunction. Combining goals 
using join is especially useful for sending multiple goals to the SMT solver in a 
single call. When there are common obligations within two goals, joining them 
before calling the SMT solver can result in a significantly faster proof. 

We check that every primitive action respects the < preorder. This relies on 
them modeling F*’s typing rules. For example, and unsurprisingly, the following 
rule for typing abstractions is what justifies the intro primitive: 


T-FUN 
T,c:thke:t 


ThrXa«:t)e: (a: thot’ 


Then, for the proof state evolution above, the validation function f is the (math- 
ematical, meta-level) function taking a term of type int (the solution for ?u2) and 
building syntax for its abstraction over x. Further, the intro primitive respects 
the correct-evolution preorder, by the very typing rule (T-Fun) from which it is 
defined. In this manner, every typing rule induces a syntax-building metapro- 
gramming step. Our primitives come from this dual interpretation of typing 
rules, which ensures that logical consistency is preserved. 

Since the < relation is a preorder, and every metaprogramming primitive we 
provide the user evolves the proof state according <, it is trivially the case that 
the final proof state returned by a (successful) computation is a correct evolution 
of the initial one. That means that when the metaprogram terminates, one has 
indeed broken down the proof obligation correctly, and is left with a (hopefully) 
simpler set of obligations to fulfill. Note that since < is a preorder, Tac provides 
an interesting example of monotonic state [2]. 
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4.2 Extracting Individual Assertions 


As discussed, the logical context of a goal processed by a tactic is not always 
syntactically evident in the program. And, as shown in the List.splitAt call in 
divide from Sect. 3.4, some obligations crucially depend on the control-flow of 
the program. Hence, the proof state must crucially include these assumptions if 
proving the assertion is to succeed. Below, we describe how Meta-F* finds proper 
contexts in which to prove the assertions, including control-flow information. 
Notably, this process is defined over logical formulae and does not depend at all 
on F*’s WP calculus or VC generator: we believe it should be applicable to any 
VC generator. 

As seen in Sect. 2.1, the basic mechanism by which Meta-F* attaches a tactic 
to a specific sub-goal is assert ¢ by 7. Our encoding of this expression is built sim- 
ilarly to F*’s existing assert construct, which is simply sugar for a pure function 
assert of type ¢:prop — Lemma (requires ¢) (ensures ¢), which essentially intro- 
duces a cut in the generated VC. That is, the term (assert ¢; e) roughly produces 
the verification condition ¢/A (6= > VCe), requiring a proof of ¢ at this point, 
and assuming ¢ in the continuation. For Meta-F*, we aim to keep this style 
while allowing asserted formulae to be decorated with user-provided tactics that 
are tasked with proving or pre-processing them. We do this in three steps. 

First, we define the following “phantom” predicate: 


let with_tactic (6 : prop) (T : unit — Tac unit) = @ 


Here ¢`with-tactic`r simply associates the tactic r with ¢, and is equivalent to 
@ by its definition. Next, we implement the assert_by_tactic lemma, and desugar 
assert ġ by 7 into assert_by_tactic ¢ r. This lemma is trivially provable by F*. 


let assert_by_tactic (@ : prop) (T : unit — Tac unit) 
: Lemma (requires (ġ ‘with_tactic’ 7 )) (ensures d) = () 


Given this specification, the term (assert ¢ by 7; e) roughly produces the verifica- 
tion condition ¢ *‘with_tactic’r A (p = VCe), with a tagged left sub-goal, and ¢ 
as an hypothesis in the right one. Importantly, F* keeps the with_tactic marker 
uninterpreted until the VC needs to be discharged. At that point, it may con- 
tain several annotated subformulae. For example, suppose the VC is VCO below, 
where we distinguish an ambient context of variables and hypotheses A: 


(VCO) A = X => (Y (x:t). R ‘with_tactic’ rı A (R => S)) 


In order to run the 7 tactic on R, it must first be “split out”. To do so, all logical 
information “visible” for 7, (i.e. the set of premises of the implications traversed 
and the binders introduced by quantifiers) must be included. As for any program 
verifier, these hypotheses include the control flow information, postconditions, 
and any other logical fact that is known to be valid at the program point where 
the corresponding assert R by rı was called. All of them are collected into A as 
the term is traversed. In this case, the VC for R is: 


(VC1) A, =X, x:t = R 
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Afterwards, this obligation is removed from the original VC. This is done by 
replacing it with T, leaving a “skeleton” VC with all remaining facts. 


(VC2) AK X => (V (xt). T A (R => S)) 


The validity of VC1 and VC2 implies that of VCO. F* also recursively descends 
into R and S, in case there are more with_tactic markers in them. Then, tactics 
are run on the the split VCs (e.g., 7; on VC1) to break them down (or solve 
them). All remaining goals, including the skeleton, are sent to the SMT solver. 

Note that while the obligation to prove R, in VC1, is preprocessed by the 
tactic T1, the assumption R for the continuation of the code, in VC2, is left as-is. 
This is crucial for tactics such as the canonicalizer from Sect. 2.1: if the skeleton 
VC2 contained an assumption for the canonicalized equality it would not help 
the SMT solver show the uncanonicalized postcondition. 

However, not all nodes marked with with_tactic are proof obligations. Suppose 
X in the previous VC was given as (Y ‘with_tactic’72). In this case, one certainly 
does not want to attempt to prove Y, since it is an hypothesis. While it would be 
sound to prove it and replace it by T, it is useless at best, and usually irreparably 
affects the system. Consider asserting the tautology (L*with_tactic’r ) => L. 

Hence, F* splits such obligations only in strictly-positive positions. On all 
others, F* simply drops the with_tactic marker, e.g., by just unfolding the def- 
inition of with_tactic. For regular uses of the assert..by construct, however, all 
occurrences are strictly-positive. It is only when (expert) users use the with_tactic 
marker directly that the above discussion might become relevant. 

Formally, the soundness of this whole approach is given by the following 
metatheorem, which justifies the splitting out of sub-assertions, and by the cor- 
rectness of evolution detailed in Sect. 4.1. The proof of Theorem 1 is straightfor- 
ward, and included in the appendix. We expect an analogous property to hold 
in other verifiers as well (in particular, it holds for first-order logic). 


Theorem 1. Let E be a context with I+ E : prop => prop, and @ a squashed 
proposition such that T F @: prop. Then the following holds: 
PE BIT] PB)eS 
DF Ef¢] 


where y(E) is the set of binders E introduces. If E is strictly-positive, then the 
reverse implication holds as well. 


5 Executing Metaprograms Efficiently 


F* provides three complementary mechanisms for running metaprograms. The 
first two, F*’s call-by-name (CBN) interpreter and a (newly implemented) call- 
by-value (CBV) NbE-based evaluator, support strong reduction—henceforth we 
refer to these as “normalizers”. In addition, we design and implement a new 
native plugin mechanism that allows both normalizers to interface with Meta- 
F* programs extracted to OCaml, reusing F*’s existing extraction pipeline for 
this purpose. Below we provide a brief overview of the three mechanisms. 
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5.1 CBN and CBV Strong Reductions 


As described in Sect.3.1, metaprograms, once reified, are simply F* terms of 
type proofstate — Div (result a). As such, they can be reduced using F*’s existing 
computation machinery, a CBN interpreter for strong reductions based on the 
Krivine abstract machine (KAM) [24,46]. Although complete and highly con- 
figurable, F*’s KAM interpreter is slow, designed primarily for converting types 
during dependent type-checking and higher-order unification. 

Shifting focus to long-running metaprograms, such as tactics for proofs by 
reflection, we implemented an NbE-based strong-reduction evaluator for F* com- 
putations. The evaluator is implemented in F* and extracted to OCaml (as is 
the rest of F*), thereby inheriting CBV from OCaml. It is similar to Boespflug 
et al.’s [16] NbE-based strong-reduction for Coq, although we do not implement 
their low-level, OCaml-specific tag-elimination optimizations—nevertheless, it is 
already vastly more efficient than the KAM-based interpreter. 


5.2 Native Plugins and Multi-language Interoperability 


Since Meta-F* programs are just F* programs, they can also be extracted to 
OCaml and natively compiled. Further, they can be dynamically linked into 
F* as “plugins”. Plugins can be directly called from the type-checker, as is 
done for the primitives, which is much more efficient than interpreting them. 
However, compilation has a cost, and it is not convenient to compile every sin- 
gle invocation. Instead, Meta-F* enables users to choose which metaprograms 
are to be plugins (presumably those expected to be computation-intensive, e.g. 
canon_semiring). Users can choose their native plugins, while still quickly scripting 
their higher-level logic in the interpreter. 

This requires (for higher-order metaprograms) a form of multi-language inter- 
operability, converting between representations of terms used in the normalizers 
and in native code. We designed a small multi-language calculus, with ML-style 
polymorphism, to model the interaction between normalizers and plugins and 
conversions between terms. See the appendix for details. 

Beyond the notable efficiency gains of running compiled code vs. interpreting 
it, native metaprograms also require fewer embeddings. Once compiled, metapro- 
grams work over the internal, concrete types for proofstate, term, etc., instead 
of over their F* representations (though still treating them abstractly). Hence, 
compiled metaprograms can call primitives without needing to embed their argu- 
ments or unembed their results. Further, they can call each other directly as well. 
Indeed, operationally there is little operational difference between a primitive 
and a compiled metaprogram used as a plugin. 

Native plugins, however, are not a replacement for the normalizers, for sev- 
eral reasons. First, the overhead in compilation might not be justified by the 
execution speed-up. Second, extraction to OCaml erases types and proofs. As 
a result, the F* interface of the native plugins can only contain types that can 
also be expressed in OCaml, thereby excluding full-dependent types—internally, 
however, they can be dependently typed. Third, being OCaml programs, native 
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plugins do not support reducing open terms, which is often required. However, 
when the programs treat their open arguments parametrically, relying on para- 
metric polymorphism, the normalizers can pass such arguments as-is, thereby 
recovering open reductions in some cases. This allows us to use native datastruc- 
ture implementations (e.g. List), which is much faster than using the normalizers, 
even for open terms. See the appendix for details. 


6 Experimental Evaluation 


We now present an experimental evaluation of Meta-F*. First, we provide bench- 
marks comparing our reflective canonicalizer from Sect. 2.1 to calling the SMT 
solver directly without any canonicalization. Then, we return to the parsers and 
serializers from Sect.2.3 and show how, for VCs that arise, a domain-specific 
tactic is much more tractable than a SMT-only proof. 


6.1 A Reflective Tactic for Partial Canonicalization 


In Sect.2.1, we have described the canon_semiring tactic that rewrites semir- 
ing expressions into sums of products. We find that this tactic significantly 
improves proof robustness. The table below compares the success rates and 
times for the poly_multiply lemma from Sect. 2.1. To test the robustness of each 
alternative, we run the tests 200 times while varying the SMT solver’s ran- 
dom seed. The smtix rows represent asking the solver to prove the lemma 
without any help from tactics, where i represents the resource limit (rlimit) 
multiplier given to the solver. This rlimit is memory-allocation based and 
independent of the particular system or current load. For the interp and 
native rows, the canon_semiring tactic is used, running it using F*’s KAM 
normalizer and as a native plugin respectively—both with an rlimit of 1. 
For each setup, we display 


the success rate of verifica- Rate Queries Tactic Total 
tion, the average (CPU) time | smtlx | 0.5% 0.216 + 0.001| - | 2.937 
taken for the SMT queries | smt2x | 2% 0.265 + 0.003} - | 2.958 
(not counting the time for ae 

parsing/processing the the- smt3x 4% 0.304 + 0.004 = 3.022 
ory) with its standard devi | smt6x || 10% |0.401 + 0.008} -= 3:158 
ation, and the average total | smt12x 12.5% 0.596 + 0.031) -— 3.321 
time (its standard deviation | smt25x 16.5% 1.063 + 0.079} = 3.790 
comeides “with that of the | ongo || 29% |291940.930| = |3.030 
queries). When applicable, 

the time for tactic execution, |Smtl00x|| 24% |5.831 + 0.776] - |8.550 
(which is independent of the interp 100% 0.141 + 0.001 | 1.156 | 4.003 
seed) is displayed. The smt | native | 100% | 0.139 + 0.001 | 0.212 | 3.071 


rows show very poor success 
rates: even when upping the rlimit to a whopping 100x, over three quarters of 
the attempts fail. Note how the (relative) standard deviation increases with the 
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rlimit: this is due to successful runs taking rather random times, and failing 
ones exhausting their resources in similar times. The setups using the tactic show 
a clear increase in robustness: canonicalizing the assertion causes this proof to 
always succeed, even at the default rlimit. We recall that the tactic variants 
still leave goals for SMT solving, namely, the skeleton for the original VC and 
the canonicalized equality left by the tactic, easily dischargeable by the SMT 
solver through much more well-behaved linear reasoning. The last column shows 
that native compilation speeds up this tactic’s execution by about 5x. 


6.2 Combining SMT and Tactics for the Parser Generator 


In Sect. 2.3, we presented a library of combinators and a metaprogramming 
approach to automate the construction of verified, mutually inverse, low-level 
parsers and serializers from type descriptions. Beyond generating the code, tac- 
tics are used to process and discharge proof obligations that arise when using the 
combinators. 

We present three strategies for discharging these obligations, including those 
of bijectivity that arise when constructing parsers and serializers for enumer- 
ated types. First, we used F*’s default strategy to present all of these proofs 
directly to the SMT solver. Second, we programmed a ~100 line tactic to dis- 
charge these proofs without relying on the SMT solver at all. Finally, we used 
a hybrid approach where a simple, 5-line tactic is used to prune the context of 
the proof removing redundant facts before presenting the resulting goals to the 
SMT solver. 


The table alongside shows the total [size || SMT only | Tactic only Hybrid 
time in seconds for verifying metapro-_—_—-—>4-——>— 
grammed low-level parsers and serializ- . 178 17.3 6.6 
ers for enumerations of different sizes. 7 468 38.3 9.8 
In short, the hybrid approach scales the 10 690 63.0 19.4 


best; the tactic-only approach is some- 
what slower; while the SMT-only approach scales poorly and is an order of 
magnitude slower. Our hybrid approach is very simple. With some more work, 
a more sophisticated hybrid strategy could be more performant still, relying on 
tactic-based normalization proofs for fragments of the VC best handled compu- 
tationally (where the SMT solver spends most of its time), while using SMT only 
for integer arithmetic, congruence closure etc. However, with Meta-F*’s ability to 
manipulate proof contexts programmatically, our simple context-pruning tactic 
provides a big payoff at a small cost. 


7 Related Work 


Many SMT-based program verifiers [7,8,19,34,48], rely on user hints, in the 
form of assertions and lemmas, to complete proofs. This is the predominant 
style of proving used in tools like Dafny [47], Liquid Haskell [60], Why3 [33], and 
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F* itself [58]. However, there is a growing trend to augment this style of semi- 
automated proof with interactive proofs. For example, systems like Why3 [33] 
allow VCs to be discharged using ITPs such as Coq, Isabelle/HOL, and PVS, 
but this requires an additional embedding of VCs into the logic of the ITP in 
question. In recent concurrent work, support for effectful reflection proofs was 
added to Why3 [50], and it would be interesting to investigate if this could also 
be done in Meta-F*. Grov and Tumas [39] present Tacny, a tactic framework for 
Dafny, which is, however, limited in that it only transforms source code, with the 
program verifier unchanged. In contrast, Meta-F* combines the benefits of an 
SMT-based program verifier and those of tactic proofs within a single language. 

Moving away from SMT-based verifiers, ITPs have long relied on separate 
languages for proof scripting, starting with Edinburgh LCF [37] and ML, and 
continuing with HOL, Isabelle and Coq, which are either extensible via ML, 
or have dedicated tactic languages [3,29,56,62]. Meta-F* builds instead on a 
recent idea in the space of dependently typed ITPs [22,30,42,63] of reusing the 
object-language as the meta-language. This idea first appeared in Mtac, a Coq- 
based tactics framework for Coq [42,63], and has many generic benefits including 
reusing the standard library, IDE support, and type checker of the proof assis- 
tant. Mtac can additionally check the partial correctness of tactics, which is also 
sometimes possible in Meta-F* but still rather limited (Sect.3.4). Meta-F*’s 
design is instead more closely inspired by the metaprogramming frameworks of 
Idris [22] and Lean [30], which provide a deep embedding of terms that metapro- 
grams can inspect and construct at will without dependent types getting in the 
way. However, F*’s effects, its weakest precondition calculus, and its use of SMT 
solvers distinguish Meta-F* from these other frameworks, presenting both chal- 
lenges and opportunities, as discussed in this paper. 

Some SMT solvers also include tactic engines [27], which allow to process 
queries in custom ways. However, using SMT tactics from a program verifier is 
not very practical. To do so effectively, users must become familiar not only with 
the solver’s language and tactic engine, but also with the translation from the 
program verifier to the solver. Instead, in Meta-F*, everything happens within 
a single language. Also, to our knowledge, these tactics are usually coarsely- 
grained, and we do not expect them to enable developments such as Sect. 2.2. 
Plus, SMT tactics do not enable metaprogramming. 

Finally, ITPs are seeing increasing use of “hammers” such as Sledgeham- 
mer [14,15,54] in Isabelle/HOL, and similar tools for HOL Light and HOL4 [43], 
and Mizar [44], to interface with ATPs. This technique is similar to Meta-F*, 
which, given its support for a dependently typed logic is especially related to 
a recent hammer for Coq [26]. Unlike these hammers, Meta-F* does not aim 
to reconstruct SMT proofs, gaining efficiency at the cost of trusting the SMT 
solver. Further, whereas hammers run in the background, lightening the load on 
a user otherwise tasked with completing the entire proof, Meta-F* relies more 
heavily on the SMT solver as an end-game tactic in nearly all proofs. 
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8 Conclusions 


A key challenge in program verification is to balance automation and expres- 
siveness. Whereas tactic-based ITPs support highly expressive logics, the tactic 
author is responsible for all the automation. Conversely, SMT-based program 
verifiers provide good, scalable automation for comparatively weaker logics, but 
offer little recourse when verification fails. A design that allows picking the right 
tool, at the granularity of each verification sub-task, is a worthy area of research. 
Meta-F* presents a new point in this space: by using hand-written tactics along- 
side SMT-automation, we have written proofs that were previously impractical 
in F*, and (to the best of our knowledge) in other SMT-based program verifiers. 
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Abstract. Research into C verification often ignores that the C standard 
leaves the evaluation order of expressions unspecified, and assigns unde- 
fined behavior to write-write or read-write conflicts in subexpressions— 
so called “sequence point violations”. These aspects should be accounted 
for in verification because C compilers exploit them. 

We present a verification condition generator (vcgen) that enables one 
to semi-automatically prove the absence of undefined behavior in a given 
C program for any evaluation order. The key novelty of our approach is 
a symbolic execution algorithm that computes a frame at the same time 
as a postcondition. The frame is used to automatically determine how 
resources should be distributed among subexpressions. 

We prove correctness of our vegen with respect to a new monadic def- 
initional semantics of a subset of C. This semantics is modular and gives 
a concise account of non-determinism in C. 

We have implemented our vegen as a tactic in the Coq interactive the- 
orem prover, and have proved correctness of it using a separation logic 
for the new monadic definitional semantics of a subset of C. 


1 Introduction 


The ISO C standard [22]—the official specification of the C language—leaves 
many parts of the language semantics either unspecified (e.g., the order of evalu- 
ation of expressions), or undefined (e.g., dereferencing a NULL pointer or integer 
overflow). In case of undefined behavior a program may do literally anything, 
e.g., it may crash, or it may produce an arbitrary result and side-effects. There- 
fore, to establish the correctness of a C program, one needs to ensure that the 
program has no undefined behavior for all possible choices of non-determinism 
due to unspecified behavior. 

In this paper we focus on the undefined and unspecified behaviors related to 
C’s expression semantics, which have been ignored by most existing verification 
tools, but are crucial for establishing the correctness of realistic C programs. The 
C standard does not require subexpressions to be evaluated in a specific order 
(e.g., from left to right), but rather allows them to be evaluated in any order. 
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Moreover, an expression has undefined behavior when there is a conflicting write- 
write or read-write access to the same location between two sequence points [22, 
6.5p2] (so called “sequence point violation”). Sequence points occur e.g., at the 
end of a full expression (;), before and after each function call, and after the 
first operand of a conditional expression (- ? - : -) has been evaluated [22, 
Annex C]. Let us illustrate this by means of the following example: 


int main() { 
int x; int y = (x = 3) + (x = 4); 
printf ("%d%d\n", x, y); 

} 


Due to the unspecified evaluation order, one would naively expect this program 
to print either “3 7” or “4 7”, depending on which assignment to x was evalu- 
ated first. But this program exhibits undefined behavior due to a sequence point 
violation: there are two conflicting writes to the variable x. Indeed, when com- 
piled with GCC (version 8.2.0), the program in fact prints “4 8”, which does 
not correspond to the expected results of any of the evaluation orders. 

One may expect that these programs can be easily ruled out statically using 
some form of static analysis, but this is not the case. Contrary to the simple pro- 
gram above, one can access the values of arbitrary pointers, making it impossible 
to statically establish the absence of write-write or read-write conflicts. Besides, 
one should not merely establish the absence of undefined behavior due to con- 
flicting accesses to the same locations, but one should also establish that there 
are no other forms of undefined behavior (e.g., that no NULL pointers are deref- 
erenced) for any evaluation order. 

To deal with this issue, Krebbers [29,30] developed a program logic based on 
Concurrent Separation Logic (CSL) [46] for establishing the absence of undefined 
behavior in C programs in the presence of non-determinism. To get an impression 
of how his logic works, let us consider the rule for the addition operator: 


{Pi} e1 {W} {P2} e2 {W2} Vvı vo. Y V1 *W v2 F @ (vy + v2) 
{P, x Po} e1 + e2 {8} 


This rule is much like the rule for parallel composition in CSL—the precondition 
should be separated into two parts P, and P describing the resources needed for 
proving the Hoare triples of both operands. Crucially, since P, and P> describe 
disjoint resources as expressed by the separating conjunction x, it is guaranteed 
that e; and e2 do not interfere with each other, and hence cannot cause sequence 
point violations. The purpose of the rule’s last premise is to ensure that for all 
possible return values vı and v2, the postconditions YW and W2 of both operands 
can be combined into the postcondition ® of the whole expression. 
Krebbers’s logic [29,30] has some limitations that impact its usability: 


— The rules are not algorithmic, and hence it is not clear how they could be 
implemented as part of an automated or interactive tool. 

— It is difficult to extend the logic with new features. Soundness was proven 
with respect to a monolithic and ad-hoc model of separation logic. 
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In this paper we address both of these problems. 

We present a new algorithm for symbolic execution in separation logic. Con- 
trary to ordinary symbolic execution in separation logic [5], our symbolic execu- 
tor takes an expression and a precondition as its input, and computes not only 
the postcondition, but also simultaneously computes a frame that describes the 
resources that have not been used to prove the postcondition. The frame is used 
to infer the pre- and postconditions of adjacent subexpressions. For example, in 
e1 + eg, we use the frame of e to symbolically execute eg. 

In order to enable semi-automated reasoning about C programs, we integrate 
our symbolic executor into a verification condition generator (vegen). Our vegen 
does not merely turn programs into proof goals, but constructs the proof goals 
only as long as it can discharge goals automatically using our symbolic executor. 
When an attempt to use the symbolic executor fails, our vegen will return a new 
goal, from which the vcgen can be called back again after the user helped out. 
This approach is useful when integrated into an interactive theorem prover. 

We prove soundness of the symbolic executor and verification condition gener- 
ator with respect to a refined version of the separation logic by Krebbers [29,30]. 
Our new logic has been developed on top of the Iris framework [24—26, 33], and 
thereby inherits all advanced features of Iris (like its expressive support for ghost 
state and invariants), without having to model these explicitly. To make our new 
logic better suited for proving the correctness of the symbolic executor and ver- 
ification condition generator, our new logic comes with a weakest precondition 
connective instead of Hoare triples as in Krebbers’s original logic. 

To streamline the soundness proof of our new program logic, we give a new 
monadic definitional translation of a subset of C relevant for non-determinism 
and sequence points into an ML-style functional language with concurrency. 
Contrary to the direct style operational semantics for a subset of C by Kreb- 
bers [29,30], our approach leads to a semantics that is both easier to understand, 
and easier to extend with additional language features. 

We have mechanized our whole development in the Coq interactive theorem 
prover. The symbolic executor and verification condition generator are defined 
as computable functions in Coq, and have been integrated into tactics in the 
Iris Proof Mode/MoSeL framework [32,34]. To obtain end-to-end correctness, 
we mechanized the proofs of soundness of our symbolic executor and verification 
condition generator with respect to our new separation logic and new monadic 
definitional semantics for a subset of C. The Coq development is available at [18]. 


Contributions. We describe an approach to semi-automatically prove the 
absence of undefined behavior in a given C program for any evaluation order. 
While doing so, we make the following contributions: 


— We define AMC: a small C-style language with a semantics by a monadic 
translation into an ML-style functional language with concurrency (Sect. 2); 

— We present a separation logic with weakest preconditions for AMC based on 
the separation logic for non-determinism in C by Krebbers [29,30] (Sect. 3); 
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— We prove soundness of our separation logic with weakest preconditions by 
giving a modular model using the Iris framework [24—26,33] (Sect. 4); 

— We present a new symbolic executor that not only computes the postcondition 
of a C expression, but also a frame, used to determine how resources should 
be distributed among subexpressions (Sect. 5); 

— On top of our symbolic executor, we define a verification condition genera- 
tor that enables semi-automated proofs using an interactive theorem prover 
(Sect. 6); 

— We demonstrate that our approach can be implemented and proved sound 
using Coq for a superset of the AMC language considered in this paper 
(Sect. 7). 


2 AMC: A Monadic Definitional Semantics of C 


In this section we describe a small C-style language called AMC, which features 
non-determinism in expressions. We define its semantics by translation into a 
ML-style functional language with concurrency called HeapLang. 

We briefly describe the AMC source language (Sect. 2.1) and the HeapLang 
target language (Sect. 2.2) of the translation. Then we describe the translation 
scheme itself (Sect. 2.3). We explain in several steps how to exploit concurrency 
and monadic programming to give a concise and clear definitional semantics. 


2.1 The Source Language AMC 


The syntax of our source language called AMC is as follows: 


v € val := z | f | 1 | NULL | (vi,v2) | O (z € Z, 1 € Loc) 
e € expr ::= v | x | (e1,e2) | e.1 | e2 | e1 © ep | (© € {+,-,...}) 
Xx <— e€13;€ | if (e1){e2}{e3} | while(e,) {eo} | e1 (ee) | 


alloc(e) | *e | e; =e | free(e) 


The values include integers, NULL pointers, concrete locations 1, function pointers 
f, structs with two fields (tuples), and the unit value () (for functions without 
return value). There is a global list of function definitions, where each definition 
is of the form f(x){e}. Most of the expression constructs resemble standard C 
notation, with some exceptions. We do not differentiate between expressions and 
statements to keep our language uniform. As such, if-then-else and sequencing 
constructs are not duplicated for both expressions and statements. Moreover, we 
do not differentiate between values and rvalues [22, 6.3.2.1]. Hence, there is no 
address operator &, and, similarly to ML, the load (*e) and assignment (e; = e2) 
operators take a reference as their first argument. 

The sequenced bind operator x — e;,;e, generalizes the normal sequencing 
operator e1 ; e2 of C by binding the result of e; to the variable x in eg. As such, 
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X <— €41; e2 can be thought of as the declaration of an immutable local variable 
x. We omit mutable local variables for now, but these can be easily added as an 
extension to our method, as shown in Sect.7. We write e4 ; ez for a sequenced 
bind _ — e41 ; e2 in which we do not care about the return value of e1. 

To focus on the key topics of the paper—non-determinism and the sequence 
point restriction—we take a minimalistic approach and omit most other features 
of C. Notably, we omit non-local control (return, break, continue, and goto). Our 
memory model is simplified; it only supports structs with two fields (tuples), 
but no arrays, unions, or machine integers. In Sect.7 we show that some of 
these features (arrays, pointer arithmetic, and mutable local variables) can be 
incorporated. 


2.2 The Target Language HeapLang 


The target language of our definitional semantics of AMC is an ML-style func- 
tional language with concurrency primitives and a call-by-value semantics. This 
language, called HeapLang, is included as part of the Iris Coq development [21]. 
The syntax is as follows: 


v € Val::= z|true|false|recfr=e|é|()|... (z € Z,£ € Loc) 
e € Expr::=v | z | e1 eg | ref(e) | lue | e1 :=n €2 | assert(e) | 


e1 ||. C2 | newmutex | acquire | release | ... 


The language contains some concurrency primitives that we will use to model 
non-determinism in AMC. Those primitives are (||4.), newmutex, acquire, and 
release. The first primitive is the parallel composition operator, which executes 
expressions € and eg in parallel, and returns a tuple of their results. The expres- 
sion newmutex () creates a new mutex. If lk is a mutex that was created this way, 
then acquire lk tries to acquire it and blocks until no other thread is using lk. 
An acquired mutex can be released using release lk. 


2.3 The Monadic Definitional Semantics of AMC 


We now give the semantics of AMC by translation into HeapLang. The transla- 
tion is carried out in several stages, each iteration implementing and illustrating 
a specific aspect of C. First, we model non-determinism in expressions by con- 
currency, parallelizing execution of subexpressions (step 1). After that, we add 
checks for sequence point violations in the translation of the assignment and 
dereferencing operations (step 2). Finally, we add function calls and demonstrate 
how the translation can be simplified using a monadic notation (step 3). 
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Step 1: Non-determinism via Parallel Composition. We model the 
unspecified evaluation order in binary expressions like e} + e2 and e =e2 by 
executing the subexpressions in parallel using the (||) operator: 


[er + e2] £ let (v1, v2) = [e1] |u [eo] in v1 +n v2 
[e1 = e2] = let (v1, v2) = [er] |lu [eo] in 
match vı with 
| None > assert(false) (* NULL pointer *) 
| Some 1 > match !,, J with 
| None > assert(false) (* Use after free *) 


| Some - — l :=p Some v9; v2 
Since our memory model is simple, the value interpretation is straightforward: 


lla =z (ifzez) [NULL] va] = None 
| Cvi, V2) var = ([vavat; [v2] vat) LO] var = () (1) vat 2 Some 1 


The only interesting case is the translation of locations. Since there is no con- 
cept of a NULL pointer in HeapLang, we use the option type to distinguish NULL 
pointers from concrete locations (1). The interpretation of assignments thus con- 
tains a pattern match to check that no NULL pointers are dereferenced. A similar 
check is performed in the interpretation of the load operation (*e). Moreover, 
each location contains an option to distinguish freed from active locations. 


Step 2: Sequence Points. So far we have not accounted for undefined behavior 
due to sequence point violations. For instance, the program (x = 3)+ (x = 
4) gets translated into a HeapLang expression that updates the value of the 
location x non-deterministically to either 3 or 4, and returns 7. However, in 
C, the behavior of this program is undefined, as it exhibits a sequence point 
violation: there is a write conflict for the location x. 

To give a semantics for sequence point violations, we follow the approach 
by Norrish [44], Elison and Rosu [17], and Krebbers [29,30]. We keep track of 
a set of locations that have been written to since the last sequence point. We 
refer to this set as the environment of our translation, and represent it using a 
global variable env of the type mset Loc. Because our target language HeapLang 
is concurrent, all updates to the environment env must be executed atomically, 
i.e., inside a critical section, which we enforce by employing a global mutex Ik. 
The interpretation of assignments e; = eg now becomes: 
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rete=\__e 


A env lk. (e1 env Ik) || (e2 env lk) 


e1 || e2 


A env lk. let x= eı env lk inez env lk 


T #+— €1; €2 
atomic_enve Ê \ env lk. acquire lk; let a = e env in release lk; a 
atomice = \ env lk. acquire lk; let a = e env (newmutex ()) in release lk; a 


run(e) e (mset_create ()) (newmutex ()) 


Fig. 1. The monadic combinators. 


[er = eg] = let (v1, v2) = Jer] || [e2] in 
acquire lk; 
match vı with 
| None — assert(false) (* NULL pointer *) 
| Some | > 
assert(—mset_member l env); (* Seq. point violation *) 
match |,, J with 
| None — assert(false) (* Use after free *) 
| Some _— mset_add l env; l :=,, Some v2; 


release lk; v2 


Whenever we assign to (or read from) a location l, we check if the location | 
is not already present in the environment env. If the location l is present, then 
it was already written to since the last sequence point. Hence, accessing the 
location constitutes undefined behavior (see the assert in the interpretation of 
assignments above). In the interpretation of assignments, we furthermore insert 
the location / into the environment env. 

In order to make sure that one can access a variable again after a sequence 
point, we define the sequenced bind operator x <— e4 ; e2 as follows: 


[x — e1;e2] = let z = [e1] inacquire lk; mset_clear env; release Ik; [e2] 


After we finished executing the expression e1, we clear the environment env, so 
that all locations are accessible in eg again. 


Step 3: Non-interleaved Function Calls. As the final step, we present 
the correct translation scheme for function calls. Unlike the other expressions, 
function calls are not interleaved during the execution of subexpressions [22, 
6.5.2.2p10]. For instance, in the program f() + gO) the possible orders of exe- 
cution are: either all the instructions in f() followed by all the instructions in 
gQ, or all the instructions in g() followed by all the instructions in f (). 
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[er + e2] = (v1, v2) — [e1] || [e2]; ret (vı +m v2) 
[e1 = e2] = (v1, v2) — [e1] || [e2]; 
atomic_env (À env. 
match vı with 
| None > assert(false) (* NULL pointer *) 
| Some 1 > 
assert(—mset member l env); (* Seq. point violation *) 
match !, J with 
| None > assert(false) (* Use after free *) 


| Some _— mset_add l env; | :=i. Some v2; ret v2) 
[x + e1;e2] = x + [ei]; - < (atomic_env mset_clear); [e2] 
[e1 (e2)] = (f, a) < [er] || [e2]; atomic (atomic_env mset_clear; f a) 


[£ (x) {e}] ê let recf x= v + [e]; - + (atomic_env mset_clear); ret v 


Fig. 2. Selected clauses from the monadic definitional semantics. 


To model this, we execute each function call atomically. In the previous step 
we used a global mutex for guarding the access to the environment. We could use 
that mutex for function calls too. However, reusing a single mutex for entering 
each critical section would not work because a body of a function may contain 
invocations of other functions. To that extent, we use multiple mutexes to reflect 
the hierarchical structure of function calls. 

To handle multiple mutexes, each C expression is interpreted as a HeapLang 
function that receives a mutex and returns its result. That is, each C expression 
is modeled by a monadic expression in the reader monad M (A) = mset Loc > 
mutex — A. For consistency’s sake, we now also use the monad to thread through 
the reference to the environment (mset Loc), instead of using a global variable 
env as we did in the previous step. 

We use a small set of monadic combinators, shown in Fig. 1, to build the 
translation in a more abstract way. The return and bind operators are standard 
for the reader monad. The parallel operator runs two monadic expressions con- 
currently, propagating the environment and the mutex. The atomic combinator 
invokes a monadic expression with a fresh mutex. The atomic_env combinator 
atomically executes its body with the current environment as an argument. The 
run function executes the monadic computation by instantiating it with a fresh 
mutex and a new environment. Selected clauses for the translation are presented 
in Fig. 2. The translation of the binary operations remains virtually unchanged, 
except for the usage of monadic parallel composition instead of the standard one. 
The translation for the assignment and the sequenced bind uses the atomic-env 
combinator for querying and updating the environment. We also have to adapt 
our translation of values, by wrapping it in ret: |v] £ ret [v]vaz- 
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A global function definition f (x) fe} is translated as a top level let-binding. A 
function call is then just an atomically executed function invocation in HeapLang, 
modulo the fact that the function pointer and the arguments are computed in 
parallel. In addition, sequence points occur at the beginning of each function call 
and at the end of each function body [22, Annex C], and we reflect that in our 
translation by clearing the environment at appropriate places. 

Our semantics by translation can easily be extended to cover other features of 
C, e.g., amore advanced memory model (see Sect. 7). However the fragment pre- 
sented here already illustrates the challenges that non-determinism and sequence 
point violations pose for verification. In the next section we describe a logic for 
reasoning about the semantics by translation given in this section. 


3 Separation Logic with Weakest Preconditions for AMC 


In this section we present a separation logic with weakest precondition proposi- 
tions for reasoning about AMC programs. The logic tackles the main features of 
our semantics—non-determinism in expressions evaluation and sequence point 
violations. We will discuss the high-level rules of the logic pertaining to C con- 
nectives by going through a series of small examples. 

The logic presented here is similar to the separation logic by Krebbers [29], 
but it is given in a weakest precondition style, and moreover, it is constructed 
synthetically on top of the separation logic framework Iris [24—26,33], whereas 
the logic by Krebbers [29] is interpreted directly in a bespoke model. 

The following grammar defines the formulas of the logic: 


P,Q € Prop ::= True | False | Vz. P | 3x. P | vi = v2 | 1 Sev | (q € (0,1]) 
P*Q|P =Q ]|wpe{®8}]... (€€ {L,U}) 


Most of the connectives are commonplace in separation logic, with the exception 
of the modified points-to connective, which we describe in this section. 

As is common, Hoare triples {P} e {®} are syntactic sugar for P H wp e {8}. 
The weakest precondition connective wp e {®} states that the program e is safe 
(the program has defined behavior), and if e terminates to a value v, then v 
satisfies the predicate 8. We write wp e {v. ® v} for wp e {Av. 8 v}. 

Contrary to the paper by Krebbers [29], we use weakest preconditions instead 
of Hoare triples throughout this paper. There are several reasons for doing so: 


1. We do not have to manipulate the preconditions explicitly, e.g., by applying 
the consequence rule to the precondition. 

2. The soundness of our symbolic executor (Theorem 5.1) can be stated more 
concisely using weakest precondition propositions. 

3. It is more convenient to integrate weakest preconditions into the Iris Proof 
Mode/MoSeL framework in Coq that we use for our implementation (Sect. 7). 
A selection of rules is presented in Fig. 3. Each inference rule -r in 

this paper should be read as the entailment Pı *...* Pa F Q. We now explain 


and motivate the rules of our logic. 
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WP-VALUE WP-WAND WP-SEQ 
@v wp e {P} (Vv. bv = Y v) wp e1 {v. U(wp e2[v/x] {8})} 
wp v {8} wpe {¥} wp (x + e1 ; e2) {8} 
WP-BIN-OP 


wp e1 {%1} wp e2 {Wo} (Vwiw2. Wa wi * Wo w2 > O(w [©] w2)) 
wp (e1 © e2) {8} 


WESLOAD P j WP-ALLOC 
wpe {1.3wq.1 u wx (16u w= ow) wp e {v. Y1. 1 u v = 1} 
wp (*e) {8} wp alloc (e) {8} 
WP-STORE 
wpe: {%1} wp e2 {%2} WP-FREE 
(Vlw. Yi Ll Ww Jv. 1 >u v * (1 >z w ~> @w)) wp e {1.3v.1 >u v*® ()} 
wp (e1 = e2) {8} wp free (e) {P} 
MAPSTO-VALUES-AGREE 
MAPSTO-SPLIT Loe vi 1 Bea v2 
Le vel Bev h L ee veg V vı = V2 
U-UNLOCK U-MONO U-INTRO U-SEP 
lezy P«Q P UP x UQ 
U(1 $v v) UP = UQ UP U(P =Q) 


Fig. 3. Selected rules for weakest preconditions. 


Non-determinism. In the introduction (Sect. 1) we have already shown the 
rule for addition from Krebbers’s logic [29], which was written using Hoare 
triples. Using weakest preconditions, the corresponding rule (WP-BIN-OP) is: 


WP e1 {Mm} WP e2 {Wo} (Vwiwe. WY W1 * W W2 —* (wi lo] w2)) 
wp (e1 © e2) {2} 


This rule closely resembles the usual rule for parallel composition in ordinary 
concurrent separation logic [46]. This should not be surprising, as we have given 
a definitional semantics to binary operators using the parallel composition opera- 
tor. It is important to note that the premises WP-BIN-OP are combined using the 
separating conjunction x. This ensures that the weakest preconditions wp e1 {Y1} 
and wp e2 {W2} for the subexpressions e; and ez are verified with respect to 
disjoint resources. As such they do not interfere with each other, and can be 
evaluated in parallel without causing sequence point violations. 

To see how one can use the rule WP-BIN-OP, let us verify P H wp (e4 + 
e2) {8}. That is, we want to show that (e; + e2) satisfies the postcondition 
® assuming the precondition P. This goal can be proven by separating the 
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precondition P into disjoint parts Pı x Pa x R AF- P. Then using wWP-BIN-OP 
the goal can be reduced to proving P; F wp e; {W%} for i € {0,1}, and 
R x Vi w * Vo wo F (w1 [O] we) for any return values w; of the expressions 
€i. 


Fractional Permissions. Separation logic includes the points-to connective 
1+ v, which asserts unique ownership of a location 1 with value v. This con- 
nective is used to specify the behavior of stateful operations, which becomes 
apparent in the following proposed rule for load: 


wpe {1.J3w.1 > wx (LH w —> 8 w)} 
wp (*e) {8} 


In order to verify *e we first make sure that e evaluates to a location 1, and 
then we need to provide the points-to connective 1 — w for some value stored at 
the location. This rule, together with WP-VALUE, allows for verification of simple 
programs like 1+ v F wp (*1) {w.w=vx*1lb v} 

However, the rule above is too weak. Suppose that we wish to verify the 
program *1+*1 from the precondition 1 +> v. According to WP-BIN-OP, we have 
to separate the proposition 1 + v into two disjoint parts, each used to verify 
the load operation. In order to enable sharing of points-to connectives we use 
fractional permissions [7,8]. In separation logic with fractional permissions each 
points-to connective is annotated with a fraction q € (0,1], and the resources 
can be split in accordance with those fractions: 


+ 
1S! ett Sve v. 


A connective 1 +> v provides a unique ownership of the location, and we refer 
to it as a write permission. A points-to connective with q < 1 provides shared 
ownership of the location, referred to as a read permission. By convention, we 
write 1+ v to denote the write permission 1 Sv. 

With fractional permissions at hand, we can relax the proposed load rule, by 
allowing to dereference a location even if we only have a read permission: 


wpe {1 avg 1 ws (16w ow)h 
wp (*e) {P} 


This corresponds to the intuition that multiple subexpressions can safely deref- 
erence the same location, but not write to them. 
Using the rule above we can verify 1 — 1 F wp (*1 + *1) {v.v=2*1 1} 


by splitting the assumption into 1 £5, 1x15 1 and first applying WP-BIN-OP 


with Yı and W being Av. (v = 1) *1 °°. 1. Then we apply WP-LOAD on both 
subgoals. After that, we can use MAPSTO-SPLIT to prove the remaining formula: 


(vy =1) #1 2% Le (vg = 1) x1 OS1b (vp ty =2)*1 1, 
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The Assignment Operator. The second main operation that accesses the 
heap is the assignment operator e = e2. The arguments on the both sides of the 
assignment are evaluated in parallel, and a points-to connective is required to 
perform an update to the heap. A naive version of the assignment rule can be 
obtained by combining the binary operation rule and the load rule: 


wp e1 {V1} wpez2{W2} (Vlw. Yı 1* Y w —> Jv.l > v> (1 w —= @w)) 
wp (e1 =e2) {P} 


The write permission 1 +> v can be obtained by combining the resources of both 
sides of the assignment. This allows us to verify programs like 1=*1 + *1. 

However, the rule above is unsound, because it fails to account for sequence 
point violations. We could use the rule above to prove safety of undefined pro- 
grams, e.g., the program 1=(1=3). 

To account for sequence point violations we decorate the points-to connec- 
tives 1 Be v with access levels € € {L,U}. These have the following seman- 
tics: we can read from and write to a location that is unlocked (U), and the 
location becomes locked (L) once someone writes to it. Proposition 1 Sy v 
(resp. 1 4; v) asserts ownership of the unlocked (resp. locked) location 1. 
We refer to such propositions as lockable points-to connectives. Using lockable 
points-to connectives we can formulate the correct assignment rule: 


wp e1 {%} wpe.{Yo} (Vlw. W, 1* W w — dv.liv«x(liy w — w)) 
wp (e1 =e2) {P} 


The set {Z,U} has a lattice structure with L < U, and the levels can be com- 
bined with a join operation, see MAPSTO-SPLIT. By convention, 1 4 v denotes 


Lage 


The Unlocking Modality. As locations become locked after using the assign- 
ment rule, we wish to unlock them in order to perform further heap operations. 
For instance, in the expression 1=4;*1 the location 1 becomes unlocked after 
the sequence point “;” between the store and the dereferencing operations. To 
reflect this in the logic, we use the rule WP-SEQ which features the unlocking 
modality U (which is called the unlocking assertion in [29, Definition 5.6]): 


wp e1 {-. U(wp e2 {P}) } 
wp (e1 ; e2) {P} 


Intuitively, UP states that P holds, after unlocking all locations. The rules of U 
in Fig. 3 allow one to turn (Pı *...* Pm) * (lı >L vi *...*xIm >L Vm) F UQ 
into (Pı *...x Pm)* (lı >u V1 *...xlm >U Vm) F Q. This is done by applying 
either U-UNLOCK or U-INTRO to each premise; then collecting all premises into 
one formula under U by U-SEP; and finally, applying U-MONO to the whole 
sequent. 
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4 Soundness of Weakest Preconditions for AMC 


In this section we prove adequacy of the separation logic with weakest precon- 
ditions for AMC as presented in Sect. 3. We do this by giving a model using the 
Iris framework that is structured in a similar way as the translation that we 
gave in Sect. 2. This translation consisted of three layers: the target HeapLang 
language, the monadic combinators, and the AMC operations themselves. In the 
model, each corresponding layer abstracts from the details of the previous layer, 
in such a way that we never have to break the abstraction of a layer. At the end, 
putting all of this together, we get the following adequacy statement: 


Theorem 4.1 (Adequacy of Weakest Preconditions). If wp e {®} is deriv- 
able, then e has no undefined behavior for any evaluation order. In other words, 
run(e) does not assert false. 


The proof of the adequacy theorem closely follows the layered structure, 
by combining the correctness of the monadic run combinator with adequacy of 
HeapLang in Iris [25, Theorem 6]. The rest of this section is organized as: 


1. Because our translation targets HeapLang, we start by recalling the separation 
logic with weakest preconditions, for HeapLang part of Iris (Sect. 4.1). 

2. On top of the logic for HeapLang, we define a notion of weakest preconditions 
WP non € {P} for expressions e built from our monadic combinators (Sect. 4.2). 

3. Next, we define the lockable points-to connective £ Be v using Iris’s machin- 
ery for custom ghost state (Sect. 4.3). 

4. Finally, we define weakest preconditions for AMC by combining the weakest 
preconditions for monadic expressions with our translation scheme (Sect. 4.4). 


4.1 Weakest Preconditions for HeapLang 


We recall the most essential Iris connectives for reasoning about HeapLang pro- 
grams: wp,, € {®} and £ >, v, which are the HeapLang weakest precondition 
proposition and the HeapLang points-to connective, respectively. Other Iris con- 
nectives are described in [6, Section 8.1] or [25,33]. An example rule is the store 
rule for HeapLang, shown in Fig.4. The rule requires a points-to connective 
L >u v, and the user receives the updated points-to connective £ +4, w back 
for proving ® (). Note that the rule is formulated for a concrete location £ 
and a value w, instead of arbitrary expressions. This does not limit the expres- 
sive power; since the evaluation order in HeapLang is deterministic!, arbitrary 
expressions can be handled using the WPy.-BIND rule. Using this rule, one can 
bind an expression e in an arbitrary evaluation context K. We can thus use the 
WPy,-BIND rule twice to derive a more general store rule for HeapLang: 


WPu, €2 {w. WPu €1 {4. (Gu. L >n v) * >u w = & ()) }} 
WPu (e1 “HL e2) {8} 


1 And right-to-left, although our monadic translation does not rely on that. 
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WPuHL-BIND 
(uv) + (L U > B 0) E WPu I L {S} wea e {v. wpa, K[v] {2} 
(E rn v) x (E i w > B ()) E WPu L ‘=u w {2} wpa Ke] {8} 


R x (Vy lk. is-mutex(y, lk, R) — & lk) F wp, newmutex () {8} 
is_mutex(7, lk, R) * (R * locked(y) -* & ()) F wp, acquire lk {8} 
is_mutex(7, lk, R) x R x locked (y) * B () F wp,, release lk {8} 
is_mutex(y, lk, R) x is-mutex(y, lk, R) 4b is-mutex(y, lk, R) (ISMUTEX-DUPL) 


Fig. 4. Selected wp,, rules. 


To verify the monadic combinators and the translation of AMC operations in 
the upcoming Sects. 4.2 and 4.4, we need the specifications for all the functions 
that we use, including those on mutable sets and mutexes. The rules for mutable 
sets are standard, and thus omitted. They involve the usual abstract predicate 
is_mset(s, X) stating that the reference s represents a set with contents X. The 
rules for mutexes are presented in Fig. 4. When a new mutex is created, a user 
gets access to a proposition is-mutex(y, lk, R), which states that the value lk is 
a mutex containing the resources R. This proposition can be duplicated freely 
(ISMUTEX-DUPL). A thread can acquire the mutex and receive the resources 
contained in it. In addition, the thread receives a token locked(y) meaning that 
it has entered the critical section. When a thread leaves the critical section and 
releases the mutex, it has to give up both the token and the resources R. 


4.2 Weakest Preconditions for Monadic Expressions 


As a next step, we define a weakest precondition proposition wp,,, € {®} for a 
monadic expression e. The definition is constructed in the ambient logic, and 
it encapsulates the monadic operations in a separate layer. Due to that, we are 
able to carry out proofs of high-level specifications without breaking the abstrac- 
tion (Sect. 4.4). The specifications for selected monadic operations in terms of 
WPnon are presented in Fig. 5. We define the weakest precondition for a monadic 
expression e as follows: 


i ee week g. Vy env lk. is.mutex(7, lk, env_inv(env)) ~) 


WPu, (g env Ik) {8} 


The idea is that we first reduce e to a monadic value g. To perform this reduction 
we have the outermost wp,, connective in the definition of wp,,,. This monadic 
value is then evaluated with an arbitrary environment and an arbitrary mutex. 
Note that we universally quantify over any mutex lk to support nested lock- 
ing in atomic. This definition is parameterized by an environment invariant 
env_inv(env), which describes the resources accessible in the critical sections. We 
show how to define env_inv in the next subsection. 
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WP-RET WP-BIND 
WPu € {P} WPmon €1 {V: WPmon €2[0/2] {P} } 
WPrnon (ret e) {P} WPmon (Œ + €13 e2) {P} 
WP-PAR 


WPmon €1 {%1} WP mon €2 {V2} (Vwiwe. Yı wi * W2 we œ> & (wi, w2)) 
WPyron (€1 || €2) {8} 


WP-ATOMIC-ENV 
Venu. env_inv(env) = wp, (v env) {w. env_inv(env) x & w} 


WPron (Atomic_env v) {8} 


Fig. 5. Selected monadic wp,,,, rules. 


mon 


Using this definition we derive the monadic rules in Fig. 5. In a monad, the 
expression evaluation order is made explicit via the bind operation x < e1; eg. 
To that extent, contrary to HeapLang, we no longer have a rule like WPy,-BIND, 
which allows to bind an expression in a general evaluation context. Instead, we 
have the rule WP-BIND, which reflects that the only evaluation context we have 
is the monadic bind x + [e]; e. 


4.3 Modeling the Heap 


The monadic rules in Fig.5 are expressive enough to derive some of the AMC- 
level rules, but we are still missing one crucial part: handling of the heap. In 
order to do that, we need to define lockable points-to connectives 1 Be v in such 
a way that they are linked to the HeapLang points-to connectives >p v. 

The key idea is the following. The environment invariant env_inv of monadic 
weakest preconditions will track all HeapLang points-to connectives £ >p v that 
have ever been allocated at the AMC level. Via Iris ghost state, we then connect 
this knowledge to the lockable points-to connectives 1 Be v. We refer to the 
construction that allows us to carry this out as the lockable heap. Note that the 
description of lockable heap is fairly technical and requires an understanding of 
the ghost state mechanism in Iris. 

A lockable heap is a map ø : Loc EN {L,U} x Val that keeps track of the 
access levels and values associated with the locations. The connective full_heap(c) 
asserts the ownership of all the locations present in the domain of ø. Specifically, 
it asserts l >y v for each {€<(E,v)} € o. The connective Z $e v then states 
that {@<—(€,v)} is part of the global lockable heap, and it asserts this with the 
fractional permission g. We treat the lockable heap as an opaque abstraction, 
whose exact implementation via Iris ghost state is described in the Coq for- 
malization [18]. The main interface for the locking heap are the rules in Fig. 6. 
The rule HEAP-ALLOC states that we can turn a HeapLang points-to connec- 
tive l >u v into £ +¢ v by changing the lockable heap ø accordingly. The 
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HEAP-ALLOC 
Lieut v full_heap(c) 


5L y v x full_heap(o [E + (U, v)]) 


HEAP-UPD 
Luy v full_heap(c) 


KE o(€) = (U, v) * £ n v * (Wv E. L n V Sk Leg v" x full_-heap(o [€ -(é’, v’)])) 


Fig. 6. Selected rules of the lockable heap construction. 


rule HEAP-UPD states that given l t+¢ v, we can temporarily get a HeapLang 
points-to connective l >p v out of the locking heap and update its value. 
The environment invariant env_inv(env) in the definition of wp, ties the 


contents of the lockable heap to the contents of the environment env: 


4 Jø X. is_set(env, X) « full_heap(c) « (V0 € X.du. o(ġ = (L,v)) 


env_inv(env) 


The first conjunct states that X : pf” (Loc) is a set of locked locations, according 
to the environment env. The second conjunct asserts ownership of the global 
lockable heap ø. Finally, the last conjunct states that the contents of env agrees 
with the lockable heap: every location that is in X is locked according to ø. 


The Unlocking Modality. The unlocking modality is defined in the logic as: 


UP £ 38. (XK ayqest Sx v) * (3K ayqest Su v) + P) 


Here S is a finite multiset of tuples containing locations, values, and fractions. 
The update modality accumulates the locked locations, waiting for them to be 
unlocked at a sequence point. 


4.4 Deriving the AMC Rules 


To model weakest preconditions for AMC (Fig. 3) we compose the construction 
we have just defined with the translation of Sect.2 wp e {8} = wpa Je] {2}. 
Here, & is the obvious lifting of ® from AMC values to HeapLang values. Using 
the rules from Figs. 5 and 6 we derive the high-level AMC rules without unfolding 
the definition of the monadic wp 


mon* 


Example 4.2. Consider the rule WP-STORE for assignments e;=e 2. Using 
WP-BIND and WP-PAR, the soundness of WP-STORE can be reduced to verify- 
ing the assignment with e; being 1, e2 being v’, under the assumption 1 >y v. 
We use WP-ATOMIC-ENV to turn our goal into a HeapLang weakest precondi- 
tion proposition and to gain access an environment env, and to the proposition 
env_inv(env), from which we extract the lockable heap o. We then use HEAP-UPD 
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to get access to the underlying HeapLang location and obtain that 1 is not locked 
according to a. Due to the environment invariant, we obtain that 1 is not in env, 
which allows us to prove the assert for sequence point violation in the interpre- 
tation of the assignment. Finally, we perform the physical update of the location. 


5 A Symbolic Executor for AMC 


In order to turn our program logic into an automated procedure, it is important 
to have rules for weakest preconditions that have an algorithmic form. However, 
the rules for binary operators in our separation logic for AMC do not have such 
a form. Take for example the rule WP-BIN-OP for binary operators e, © e2. This 
rule cannot be applied in an algorithmic manner. To use the rule one should 
supply the postconditions for e; and eg, and frame the resources from the context 
into two disjoint parts. This is generally impossible to do automatically. 

To address this problem, we first describe how the rules for binary operators 
can be transformed into algorithmic rules by exploiting the notion of symbolic 
execution [5] (Sect. 5.1). We then show how to implement these algorithmic rules 
as part of an automated symbolic execution procedure (Sect. 5.2). 


5.1 Rules for Symbolic Execution 


We say that we can symbolically execute an expression e using a precondition P, 
if we can find a symbolic execution tuple (w, Q, R) consisting of a return value w, 
a postcondition Q, and a frame R satisfying: 


Ptwpe{v.v=wxQ}*«R 


This specification is much like that of ordinary symbolic execution in separation 
logic [5], but there is important difference. Apart from computing the postcon- 
dition Q and the return value w, there is also the frame R, which describes the 
resources that are not used for proving e. For instance, if the precondition P is 


P' «14 wand e is a load operation *1, then we can symbolically execute e with 


the postcondition Q being 1 Ries w, and the frame R being P’ «1 led w. Clearly, 


P’ is not needed for proving the load, so it can be moved into the frame. More 
interestingly, since loading the contents of 1 requires a read permission 1 S w, 
with p € (0, 1], we can split the hypothesis 1 w into two halves and move one 
into the frame. Below we will see why that matters. 

If we can symbolically execute one of the operands of a binary expression 
e1 © e2, say e; in P, and find a symbolic execution tuple (w1, Q, R), then we 
can use the following admissible rule: 


RE wp eg {wo. Q = P (w1 [©] w2)} 
P F wp (e1 © e2) {8} 


This rule has a much more algorithmic flavor than the rule WP-BIN-OP. Applying 
the above rule now boils down to finding such a tuple (w, Q, R), instead of having 
to infer postconditions for both operands, as we need to do to apply WP-BIN-OP. 
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For instance, given an expression (*1) © ez and a precondition P’ «1% v, 
we can derive the following rule: 


Psi vb wp ea faz 15 v = 0 (v [ol wa) } 


P' x15 vF wp (*1 © eg) {8} 


This rule matches the intuition that only a fraction of the permission 1 4 v is 
needed to prove a load *1, so that the remaining half of the permission can be 
used to prove the correctness of e2 (which may contain other loads of 1). 


5.2 An Algorithm for Symbolic Execution 


For an arbitrary expression e and a proposition P, it is unlikely that one can find 
such a symbolic execution tuple (w, Q, R) automatically. However, for a certain 
class of C expressions that appear in actual programs we can compute a choice 
of such a tuple. To illustrate our approach, we will define such an algorithm for 
a small subset expr of C expressions described by the following grammar: 


& € expr ::= v | *& | 6; =& | 8, © &. 


We keep this subset small to ease presentation. In Sect.7 we explain how to 

extend the algorithm to cover the sequenced bind operator x +— &4 ; &2. 
Moreover, to implement symbolic execution, we cannot manipulate arbitrary 

separation logic propositions. We thus restrict to symbolic heaps (m € sheap), 


which are defined as finite partial functions Loc 2a, ({L,U} x (0,1] x val) rep- 
resenting a collection of points-to propositions: 


[m] £ > 1e v. 
1€dom(m) 
m(1)=(€,q,v) 
We use the following operations on symbolic heaps: 


— m[1 > (€,q,v)] sets the entry m(1) to (€,q,v); 

— m\ {1+ _} removes the entry m(1) from m; 

— mı U Mg merges the symbolic heaps mı and mə in such a way that for each 
1 € dom(m,) Udom(mz), we have: 


mj(1) if 1 € dom(m;) and 1 ¢ dom(m;) 

(§Vé,qtq',v) if mi(1) = (€,9,v) and ma(1) = (€’,4’, -). 
With this representation of propositions, we define the symbolic execution 

algorithm as a partial function forward : (sheap x expr) — (val x sheap x sheap), 


which satisfies the specification stated in Sect. 5.1, i.e., for which the following 
holds: 


(mı Ume2)(1) = 


Theorem 5.1. Given an expression e and an symbolic heap m, if forward (m, e) 
returns a tuple (w, M$, mı), then |m] F wp e {v. v = wx [m2] } * [ma]. 
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The definition of the algorithm is shown in Fig. 7. Given a tuple (m,e), a call 
to forward (m,e) either returns a tuple (v, m°, m’) or fails, which either happens 
when e ¢ expr or when one of intermediate steps of computation fails. In the 
latter cases, we write forward (m,e) = L. 

The algorithm proceeds by case analysis on the expression e. In each case, 
the expected output is described by the equation forward (m,e) = (v, m°, m’). 
The results of the intermediate computations appear on separate lines under the 
clause “where ...”. If one of the corresponding equations does not hold, e.g., 
a recursive call fails, then the failure is propagated. Let us now explain the case 
for the assignment operator. 

If e is an assignment operator e;=e2, we first evaluate e; and then e2. 
Fixing the order of symbolic execution from left to right does not compromise the 
non-determinism underlying the C semantics of binary operators. Indeed, when 
forward(m,e1) = (vi, M$, mı), we evaluate the expression e2, using the frame 
my, i.e., only the resources of m that remain after the execution of e1. When 
forward (m, e1) = (1, m9, mı), with 1 € Loc, and forward (m1, e2) = (v2, M$, M2), 
the function delete full_2(1,m2,m? U m$) checks whether (m2 U m? U m$)(1) 


forward (m, v) £ (v, Ø, m) 
forward (m, e1 © e2) Ê (vı [©] v2, mi U ms, m2) 
where (v1, m$, mı) = forward (m, e1) 
(Y2, M9, M2) = forward (M1, e2) 
forward (m, *e1) £ (w, m5 U {1 > (U, q, w) }, m2) 
where (1, m{, mı) = forward (m, e1) provided 1 € Loc 
(m2, Mm3,q, w) = delete_frac_2(1, mı, m?) 
forward (m, e1 = e2) Ê (v2, m5 U {1 > (L, 1, v2) }, m3) 
where (1, m9, mı) = forward (m, e1) provided 1 € Loc 
(v2, M3, M2) = forward (M1 , e2) 
(m3, m5) = delete full_2(1, m2, m{ U m3) 


forward (m,e) ê L if e ¢ expr 


Auxiliary functions: 
(mi [1 > (U,q/2,v)],m2,q/2,v) if mı(1) = (U, q, v) 


(mı, m2[1 > (U,q/2,v)1,q/2,v) if mi(1) # (U, -,-), 


delete_frac-2 (1, mı, m2) Ê 
mə(1) (U, q, v) 


ihe otherwise 


delete_full_2(1,mi,m2) = (mı \ {1 > -Jp m \ {1 _}) 
where (U,1,-) = (mi U m2)(1) 


Fig. 7. The definition of the symbolic executor. 
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contains the write permission 1++y _. If this holds, it removes the location 1, so 
that the write permission is now consumed. Finally, we merge {1+> (L,1,v2)} 
with the output heap m3, so that after assignment, the write permission 1 >z v2 
is given back in a locked state. 


6 A Verification Condition Generator for AMC 


To establish correctness of programs, we need to prove goals P F wp e {8}. To 
prove such a goal, one has to repeatedly apply the rules for weakest preconditions, 
intertwined with logical reasoning. In this section we will automate this process 
for AMC by means of a verification condition generator (vegen). 

As a first attempt to define a vcgen, one could try to recurse over the expres- 
sion e and apply the rules in Fig.3 eagerly. This would turn the goal into a 
separation logic proposition that subsequently should be solved. However, as we 
pointed out in Sect.5.1, the resulting separation logic proposition will be very 
difficult to prove—either interactively or automatically—due to the existentially 
quantified postconditions that appear because of uses of the rules for binary 
operators (e.g., WP-BIN-OP). We then proposed alternative rules that avoid the 
need for existential quantifiers. These rules look like: 


RE wp eg {v2. Q > & (vı [©] v2)} 
PF wp (e1 © e2) {8} 


To use this rule, the crux is to symbolically execute e; with precondition P into 
a symbolic execution triple (v1, Q, R), which we alluded could be automatically 
computed by means of the symbolic executor if e, € expr (Sect. 5.2). 

We can only use the symbolic executor if P is of the shape [m] for a symbolic 
heap m. However, in actual program verification, the precondition P is hardly 
ever of that shape. In addition to a series of points-to connectives (as described by 
a symbolic heap), we may have arbitrary propositions of separation logic, such as 
pure facts, abstract predicates, nested Hoare triples, Iris ghost state, etc. These 
propositions may be needed to prove intermediate verification conditions, e.g., 
for function calls. As such, to effectively apply the above rule, we need to separate 
our precondition P into two parts: a symbolic heap [m] and a remainder P’. 
Assuming forward (m, e1) = (v1, m9, m1), we may then use the following rule: 


P! x [mi] F wp e2 {v2. [m9] — 2 (vi JO] ve)} 
P' x [m] F wp (e1 © e2) {8} 


It is important to notice that by applying this rule, the remainder P’ remains 
in our precondition as is, but the symbolic heap is changed from |m] into [mi], 
i.e., into the frame that we obtained by symbolically executing e1. 

It should come as no surprise that we can automate this process, by applying 
rules, such as the one we have given above, recursively, and threading through 
symbolic heaps. Formally, we do this by defining the vcgen as a total function: 
vcg : (sheap x expr x (sheap — val — Prop)) — Prop where Prop is the type of 
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propositions of our logic. The definition of vcg is given in Fig. 8. Before explaining 
the details, let us state its correctness theorem: 


Theorem 6.1. Given an expression e, a symbolic heap m, and a postcondition 
®, the following statement holds: 


P' + veg (m, e, Am v. [m] = & v) 
P' x |m] F wp e {8} 


This theorem reflects the general shape of the rules we previously described. 
We start off with a goal P’«[m] + wp e {8}, and after using the vegen, we should 
prove that the generated goal follows from P’. It is important to note that the 
continuation in the vcgen is not only parameterized by the return value, but also 
by a symbolic heap corresponding to the resources that remain. To get these 
resources back, the vegen is initiated with the continuation Am’ v. [m] = @ v. 

Most clauses of the definition of the vegen (Fig. 8) follow the approach we 
described so far. For unary expressions like load we generate a condition that 
corresponds to the weakest precondition rule. For binary expressions, we sym- 
bolically execute either operand, and proceed recursively in the other. There are 
a number of important bells and whistles that we will discuss now. 


Sequencing. In the case of sequenced binds x — e4 ; e2, we recursively compute 
the verification condition for e; with the continuation: 


Am! v. U (veg (unlock(m’), e2 [v/x],K)). 


Due to a sequence point, all locations modified by e, will be in the unlocked state 
after it is finished executing. Therefore, in the recursive call to e2 we unlock all 
locations in the symbolic heap (c.f. unlock(m’)), and we include a U modality 
in the continuation. The U modality is crucial so that the resources that are not 
given to the vcgen (the remainder P’ in Theorem 6.1) can also be unlocked. 


Handling Failure. In the case of binary operators e; © eg, it could be that 
the symbolic executor fails on both e; and eg, because neither of the arguments 
were of the right shape (i.e., not an element of expr), or the required resources 
were not present in the symbolic heap. In this case the vcgen generates the goal 
of the form [m] = wp (e1 © e2) {Kret} where Kres = Aw. Im’. [m] * K m w. 
What appears here is that the current symbolic heap [m] is given back to the 
user, which they can use to prove the weakest precondition of e1 © e2 by hand. 
Through the postcondition 3m’. [m] x K m w the user can resume the vegen, 
by choosing a new symbolic heap m’ and invoking the continuation Km’ w. 

For assignments e, =e, we have a similar situation. Symbolic execution of 
both e; and eg may fail, and then we generate a goal similar to the one for binary 
operators. If the location 1 that we wish to assign to is not in the symbolic heap, 
we use the continuation |m] >= dw.1l Ry wx (1 >z v => Kret v). As before, 
the user gets back the current symbolic heap [|m], and could resume the vegen 
through the postcondition Kret v by picking a new symbolic heap. 
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\|> 


veg(m,v,K) =K mv 


II 


vcg(m, e1 © e2, K) 


veg (m2, e2, AM v2. K (m Um?) (vi © v2)) if forward (m, e1) = (v1, M°, m2) 


veg (M1, e1, AM’ v1. K (m Um?) (vi © v2)) if forward (m, e1) = L and 
forward (m, e2) = (v2, M°, Mı) 


[m] = wp (e1 © e2) {Kret } otherwise 


veg (m, *e, K) £ veg(m, e, K’) 
K mw if 1 € Loc and m(1) = (U, q, w) 


with K’ £ \ml. 
[m] = Sw q.1 ou wx (1 ou wk Kret w) otherwise 


veg (m, e1 =e2,K) £ 


veg (m2,e2,Am'v.K' (m'Um?)(1,v)) if forward (m, e1) = (1,m°, m2) 


veg (m1, e1, AM1. K’ (m’Um?)(1,v)) if forward(m,e1) = L and 
forward (m, e2) = (v, m°, m1) 


[m] - wp (e1 = e2) {Kret } otherwise 
with K' ê àm (1,v). 
t (m U {1 > (L,1,v)})v if1 € Loc and delete-full(1, m) = m’ 


[m] -* Sw. liu w * (LH, v —> Kres v) otherwise 
vcg(m, x + e1; e2, C) £ veg (m, e1, \m’ v. U (veg (unlock (m’), e2 [v/x],K))) 
Auxiliary functions: 
Kret : val + Prop Ê Aw. (Am’. [m] * KK m'w)  unlock(m) £ |_| {1 => (U,q,v)} 


1€dom(m) 
m(1)=(_,q,v) 


Fig. 8. Selected cases of the verification condition generator. 
7 Discussion 


Extensions of the Language. The memory model that we have presented 
in this paper was purposely oversimplified. In Coq, the memory model for AMC 
additionally supports mutable local variables, arrays, and pointer arithmetic. 
Adding support for these features was relatively easy and required only local 
changes to the definitional semantics and the separation logic. 

For implementing mutable local variables, we tag each location with a 
Boolean that keeps track of whether it is an allocated or a local variable. That 
way, we can forbid deallocating local variables using the free(—) operator. 

Our extended memory model is block/offset-based like CompCert’s memory 
model [38]. Pointers are not simply represented as locations, but as pairs (£, 7), 
where £ is a HeapLang reference to a memory block containing a list of values, 
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and 7 is an offset into that block. The points-to connectives of our separation 
logic then correspondingly range over block/offset-based pointers. 


Symbolic Execution of Sequence Points. We adapt our forward algorithm 
to handle sequenced bind operators x + e4 ; e2. The subtlety lies in supporting 
nested sequenced binds. For example, in an expression (x +— e;,;¢2) + e3 the 
postcondition of e; can be used (along with the frame) for the symbolic execution 
of e2, but it cannot be used for the symbolic execution of e3. In order to solve 
this, our forward algorithm takes a stack of symbolic heaps as an input, and 
returns a stack of symbolic heaps (of the same length) as a frame. All the cases 
shown in Fig.7 are easily adapted w.r.t. this modification, and the following 
definition captures the case for the sequence point bind: 


= A = 
forward (771, x — e1 ; e2) = (v2, MS Lm’, m2) 
where (v1, MÌ, 7711) = forward (m, e1) 


(v2, M9, M : M2) = forward (unlock (MÈ) :: mı, e2 [vi/x]) 


Shared Resource Invariants. As in Krebbers’s logic [29], the rules for binary 
operators in Fig. 3 require the resources to be separated into disjoint parts for the 
subexpressions. If both sides of a binary operator are function calls, then they 
can only share read permissions despite that both function calls are executed 
atomically. Following Krebbers, we address this limitation by adding a shared 
resource invariant R to our weakest preconditions and add the following rules: 


f (x) {e} defined 
Ry WPR sr € {V. Ri = P v} R > U(WPrrue € [x/v] {w. R * & w}) 


WPpr, € {Pt WPpr f (v) {Pf 


To temporarily transfer resources into the invariant, one can use the first 
rule. Because function calls are not interleaved, one can use the last rule to gain 
access to the shared resource invariant for the duration of the function call. 

Our handling of shared resource invariants generalizes the treatment by Kreb- 
bers: using custom ghost state in Iris we can endow the resource invariant with a 
protocol. This allows us to verify examples that were previously impossible [29]: 


int f(int *p, int y) { return (*p = y); } 
int main() { int x; f(&x, 3) + f(&x, 4); return x; } 


Krebbers could only prove that main returns 0, 3 or 4, whereas we can prove 
it returns 3 or 4 by combining resource invariants with Iris’s ghost state. 


Implementation in Coq. In the Coq development [18] we have: 


— Defined AMC with the extensions described above, as well as the monadic 
combinators, as a shallow embedding on top of Iris’s HeapLang [21,25]. 

— Modeled the separation logic for AMC and the monadic combinators as a 
shallow embedding on top of the Iris’s program logic for HeapLang. 
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— Implemented the symbolic executor and vcgen as computable Coq functions, 
and proved their soundness w.r.t. our separation logic. 

— Turned the verification condition generator into a tactic that integrates into 
the Iris Proof Mode/MoSeL framework [32,34]. 


This last point allowed us to leverage the existing machinery for separation 
logic proofs in Coq. Firstly, we get basic building blocks for implementing the 
vegen tactic for free. Secondly, when the vegen is unable to solve the goal, one 
can use the Iris Proof Mode/MoSeL tactics to help out in a convenient manner. 

To implement the symbolic executor and vcgen, we had to reify the terms 
and values of AMC. To see why reification is needed, consider the data type for 
symbolic heaps, which uses locations as keys. In proofs, those locations appear 
as universally quantified variables. To compute using these, we need to reify 
them into some symbolic representation. We have implemented the reification 
mechanism using type classes, following Spitters and van der Weegen [47]. 

With all the mechanics in place, our vegen is able to significantly aid us. Con- 
sider the following program that copies the contents of one array into another: 


int arraycopy(int *p, int *q, int n) { 
int pend =p + n; 
while (p < pend) { *(p++) = *(q++); } 
} 


We proved {p> %*q+> y*(|Z| = |y| =n) }arraycopy(p,q,n) {p > yeqr y} in 
11 lines of Coq code. The vcgen can automatically process the program up until 
the while loop. At that point, the user has to manually perform an induction on 
the array, providing a suitable induction hypothesis. The vcgen is then able to 
discharge the base case automatically. In the inductive case, it will automatically 
process the program until the next iteration of the while loop, where the user 
has to apply the induction hypothesis. 


8 Related Work 


C Semantics. There has been a considerable body of work on formal semantics 
for the C language, including several large projects that aimed to formalize sub- 
stantial subsets of C [17,20,30,37,41,44], and projects that focused on specific 
aspects like its memory model [10, 13,27, 28,31,38,40,41], weak memory concur- 
rency [4,36,43], non-local control flow [35], verified compilation [37,48], etc. 

The focus of this paper—non-determinism in C expressions—has been treated 
formally a number of times, notably by Norrish [44], Ellison and Rosu [17], 
Krebbers [31], and Memarian et al. [41]. The first three have in common that they 
model the sequence point restriction by keeping track of the locations that have 
been written to. The treatment of sequence points in our definitional semantics 
is closely inspired by the work of Ellison and Rosu [17], which resembles closely 
what is in the C standard. Krebbers [31] used a more restrictive version of the 
semantics by Ellison and Rosu—he assigned undefined behavior in some corner 
cases to ease the soundness theorem of his logic. We directly proved soundness 
of the logic w.r.t. the more faithful model by Ellison and Rosu. 
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Memarian et al. [41] give a semantics to C by elaboration into a language they 
call Core. Unspecified evaluation order in Core is modeled using an unseq oper- 
ation, which is similar to our ||,,, operation. Compared to our translation, Core 
is much closer to C (it has function calls, memory operations, etc. as primitives, 
while we model them with monadic combinators), and supports concurrency. 


Reasoning Tools and Program Logics for C. Apart from formalizing the 
semantics of C, there have been many efforts to create reasoning tools for the C 
language in one way or another. There are standalone tools, like VeriFast [23], 
VCC [12], and the Jessie plugin of Frama-C [42], and there are tools built on top 
of general purpose proof assistants like VST [1,10] in Coq, or AutoCorres [19] in 
Isabelle/HOL. Although, admittedly, all of these tools cover larger subsets of C 
than we do, as far as we know, they all ignore non-determinism in expressions. 

There are a few exceptions. Norrish proved confluence for a certain class of 
C expressions [45]. Such a confluence result may be used to justify proofs in a 
tool that does not have an underlying non-deterministic semantics. 

Another exception is the separation logic for non-determinism in C by Kreb- 
bers [29]. Our work is inspired by his, but there are several notable differences: 


— We have proved soundness with respect to a definitional semantics for a subset 
of C. We believe that this approach is more modular, since the semantics can 
be specified at a higher level of abstraction. 

— We have built our logic on top of the Iris framework. This makes the devel- 
opment more modular (since we can use all the features as well as the Coq 
infrastructure of Iris) and more expressive (as shown in Sect. 7). 

— There was no automation like our vcgen, so one had to subdivide resources 
between subexpressions manually all the time. Also, there was not even tac- 
tical support for carrying out proofs manually. Our logic is redesigned to get 
such support from the Iris Proof Mode/MoSeL framework. 


To handle missing features of C as part of our vcgen, we plan to explore 
approaches by other verification projects in proof assistants. A notable example 
of such a project is VST, which supports machine arithmetic [16] and data types 
like structs and unions [10] as part of its tactics for symbolic execution. 


Separation Logic and Symbolic Execution. In their seminal work, Berdine 
et al. [5] demonstrate the application of symbolic execution to automated rea- 
soning in separation logic. In their setting, frame inference is used to perform 
symbolic execution of function calls. The frame has to be computed when the call 
site has more resources than needed to invoke a function. In our setting we com- 
pute frames for subexpressions, which, unlike functions, do not have predefined 
specifications. Due to that, we have to perform frame inference simultaneously 
with symbolic execution. The symbolic execution algorithm of Berdine et al. can 
handle inductive predicates, and can be extended with shape analysis [15]. We 
do not support such features, and leave them to future work. 

Caper [14] is a tool for automated reasoning in concurrent separation logic, 
and it also deals with non-determinism, although the nature of non-determinism in 
Caper is different. Non-determinism in Caper arises due to branching on unknown 
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conditionals and due to multiple possible ways to apply ghost state related rules 
(rules pertaining to abstract regions and guards). The former cause is tackled by 
considering sets of symbolic execution traces, and the latter is resolved by employ- 
ing heuristics based on bi-abduction [9]. Applications of abductive reasoning to 
our approach to symbolic execution are left for future work. 

Recently, Bannister et al. [2,3] proposed a new separation logic connective for 
performing forwards reasoning whilst avoiding frame inference. This approach, 
however, is aimed at sequential deterministic programs, focusing on a notion of 
partial correctness that allows for failed executions. Another approach to veri- 
fication of sequential stateful programs is based on characteristic formulae [11]. 
A stateful program is transformed into a higher-order logic predicate, implicitly 
encoding the frame rule. The resulting formula is then proved by a user in Coq. 

When implementing a vcgen in a proof assistant (see e.g., [10,39]) it is com- 
mon to let the vcgen return a new goal when it gets stuck, from which the 
user can help out and call back the vegen. The novelty of our work is that this 
approach is applied to operations that are called in parallel. 


Acknowledgments. We are grateful to Gregory Malecha and the anonymous review- 
ers and for their comments and suggestions. This work was supported by the Nether- 
lands Organisation for Scientific Research (NWO), project numbers STW.14319 (first 
and second author) and 016.Veni.192.259 (third author). 


References 


1. Appel, A.W. (ed.): Program Logics for Certified Compilers. Cambridge University 
Press, New York (2014) 

2. Bannister, C., Hofner, P.: False failure: creating failure models for separation logic. 
In: Desharnais, J., Guttmann, W., Joosten, S. (eds.) RAMiCS 2018. LNCS, vol. 
11194, pp. 263-279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030- 
02149-8_16 

3. Bannister, C., Hofner, P., Klein, G.: Backwards and forwards with separation 
logic. In: Avigad, J., Mahboubi, A. (eds.) ITP 2018. LNCS, vol. 10895, pp. 68-87. 
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94821-8_5 

4. Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ con- 
currency. In: POPL, pp. 55-66 (2011) 

5. Berdine, J., Calcagno, C., O’Hearn, P.W.: Symbolic execution with separation 
logic. In: Yi, K. (ed.) APLAS 2005. LNCS, vol. 3780, pp. 52-68. Springer, 
Heidelberg (2005). https://doi.org/10.1007/11575467_5 

6. Birkedal, L., Bizjak, A.: Lecture Notes on Iris: Higher-Order Concurrent Separation 
Logic, August 2018. https: //iris-project.org/tutorial-material.html 

7. Bornat, R., Calcagno, C., O’Hearn, P.W., Parkinson, M.J.: Permission accounting 
in separation logic. In: POPL, pp. 259-270 (2005) 

8. Boyland, J.: Checking interference with fractional permissions. In: Cousot, R. (ed.) 
SAS 2003. LNCS, vol. 2694, pp. 55-72. Springer, Heidelberg (2003). https://doi. 
org/10.1007/3-540-44898-5_4 

9. Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional shape anal- 
ysis by means of bi-abduction. J. ACM 58(6), 26:1-26:66 (2011) 

10. Cao, Q., Beringer, L., Gruetter, S., Dodds, J., Appel, A.W.: VST-Floyd: a separa- 
tion logic tool to verify correctness of C programs. JAR 61(1-4), 367—422 (2018) 


86 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 
22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


D. Frumin et al. 


Charguéraud, A.: Characteristic formulae for the verification of imperative pro- 
grams. SIGPLAN Not. 46(9), 418-430 (2011) 

Cohen, E., et al.: VCC: a practical system for verifying concurrent C. In: Berghofer, 
S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 
23-42. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_2 
Cohen, E., Moskal, M., Tobies, S., Schulte, W.: A precise yet efficient memory 
model for C. ENTCS 254, 85-103 (2009) 

Dinsdale-Young, T., da Rocha Pinto, P., Andersen, K.J., Birkedal, L.: CAPER - 
automatic verification for fine-grained concurrency. In: Yang, H. (ed.) ESOP 2017. 
LNCS, vol. 10201, pp. 420-447. Springer, Heidelberg (2017). https://doi.org/10. 
1007/978-3-662-54434-1_16 

Distefano, D., O’Hearn, P.W., Yang, H.: A local shape analysis based on separation 
logic. In: Hermanns, H., Palsberg, J. (eds.) TACAS 2006. LNCS, vol. 3920, pp. 
287-302. Springer, Heidelberg (2006). https://doi.org/10.1007/11691372_19 
Dodds, J., Appel, A.W.: Mostly sound type system improves a foundational pro- 
gram verifier. In: Gonthier, G., Norrish, M. (eds.) CPP 2013. LNCS, vol. 8307, pp. 
17-32. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03545-1_2 
Ellison, C., Rosu, G.: An executable formal semantics of C with applications. In: 
POPL, pp. 533-544 (2012) 

Frumin, D., Gondelman, L., Krebbers, R.: Semi-automated reasoning about non- 
determinism in C expressions: Coq development, February 2019. https://cs.ru.nl/ 
~dfrumin/wpc/ 

Greenaway, D., Lim, J., Andronick, J., Klein, G.: Don’t sweat the small stuff: 
formal verification of C code without the pain. In: PLDI, pp. 429-439 (2014) 
Hathhorn, C., Ellison, C., Roşu, G.: Defining the undefinedness of C. In: PLDI, 
pp. 336-345 (2015) 

Iris: Iris Project, November 2018. https://iris-project.org/ 

ISO: ISO/IEC 9899-2011: Programming Languages - C. ISO Working Group 14 
(2012) 

Jacobs, B., Smans, J., Piessens, F.: A quick tour of the VeriFast program verifier. 
In: Ueda, K. (ed.) APLAS 2010. LNCS, vol. 6461, pp. 304-311. Springer, Heidelberg 
(2010). https://doi.org/10.1007/978-3-642-17164-2_21 

Jung, R., Krebbers, R., Birkedal, L., Dreyer, D.: Higher-order ghost state. In: 
ICFP, pp. 256-269 (2016) 

Jung, R., Krebbers, R., Jourdan, J.H., Bizjak, A., Birkedal, L., Dreyer, D.: Iris from 
the ground up: a modular foundation for higher-order concurrent separation logic. 
J. Funct. Program. 28, e20 (2018). https://doi.org/10.1017/S0956796818000151 
Jung, R., et al.: Iris: monoids and invariants as an orthogonal basis for concurrent 
reasoning. In: POPL, pp. 637-650 (2015) 

Kang, J., Hur, C., Mansky, W., Garbuzov, D., Zdancewic, S., Vafeiadis, V.: A 
formal C memory model supporting integer-pointer casts. In: POPL, pp. 326-335 
(2015) 

Krebbers, R.: Aliasing restrictions of C11 formalized in Coq. In: Gonthier, G., 
Norrish, M. (eds.) CPP 2013. LNCS, vol. 8307, pp. 50-65. Springer, Cham (2013). 
https: //doi.org/10.1007/978-3-319-03545-1_4 

Krebbers, R.: An operational and axiomatic semantics for non-determinism and 
sequence points in C. In: POPL, pp. 101-112 (2014) 

Krebbers, R.: The C standard formalized in Coq. Ph.D. thesis, Radboud University 
Nijmegen (2015) 

Krebbers, R.: A formal C memory model for separation logic. JAR 57(4), 319-387 
(2016) 


32. 


33. 


34. 


35. 


47. 


48. 


Semi-automated Reasoning About Non-determinism in C Expressions 87 


Krebbers, R., et al.: MoSeL: a general, extensible modal framework for interactive 
proofs in separation logic. PACMPL 2(ICFP), 77:1-77:30 (2018) 

Krebbers, R., Jung, R., Bizjak, A., Jourdan, J.-H., Dreyer, D., Birkedal, L.: The 
Essence of higher-order concurrent separation logic. In: Yang, H. (ed.) ESOP 2017. 
LNCS, vol. 10201, pp. 696-723. Springer, Heidelberg (2017). https://doi.org/10. 
1007/978-3-662-54434-1_26 

Krebbers, R., Timany, A., Birkedal, L.: Interactive proofs in higher-order concur- 
rent separation logic. In: POPL, pp. 205-217 (2017) 

Krebbers, R., Wiedijk, F.: Separation logic for non-local control flow and block 
scope variables. In: Pfenning, F. (ed.) FoSSaCS 2013. LNCS, vol. 7794, pp. 257- 
272. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37075-5_17 


. Lahav, O., Vafeiadis, V., Kang, J., Hur, C., Dreyer, D.: Repairing Sequential Con- 


sistency in C/C++11. In: PLDI, pp. 618-632 (2017) 
Leroy, X.: Formal verification of a realistic compiler. CACM 52(7), 107-115 (2009) 


. Leroy, X., Blazy, S.: Formal verification of a C-like memory model and its uses for 


verifying program transformations. JAR 41(1), 1-31 (2008) 


. Malecha, G.: Extensible proof engineering in intensional type theory. Ph.D. thesis, 


Harvard University (2014) 


. Memarian, K., et al.: Exploring C semantics and pointer provenance. PACMPL 


3(POPL), 67:1-67:32 (2019) 


. Memarian, K., et al.: Into the depths of C: elaborating the De Facto Standards. 


In: PLDI, pp. 1-15 (2016) 


. Moy, Y., Marché, C.: The Jessie Plugin for Deduction Verification in Frama-C, 


Tutorial and Reference Manual (2011) 


. Nienhuis, K., Memarian, K., Sewell, P.: An operational semantics for C/C++11 


concurrency. In: OOPSLA, pp. 111-128 (2016) 


. Norrish, M.: C Formalised in HOL. Ph.D. thesis, University of Cambridge (1998) 
. Norrish, M.: Deterministic expressions in C. In: Swierstra, S.D. (ed.) ESOP 1999. 


LNCS, vol. 1576, pp. 147-161. Springer, Heidelberg (1999). https://doi.org/10. 
1007 /3-540-49099-X_10 


. O’Hearn, P.W.: Resources, concurrency, and local reasoning. Theor. Comput. Sci. 


375(1), 271-307 (2007). Festschrift for John C. Reynolds’s 70th birthday 
Spitters, B., Van der Weegen, E.: Type classes for mathematics in type theory. 
Math. Struct. Comput. Sci. 21(4), 795-825 (2011) 

Stewart, G., Beringer, L., Cuellar, S., Appel, A.W.: Compositional CompCert. In: 
POPL, pp. 275-287 (2015) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


Safe Deferred Memory Reclamation 
with Types 


Ismail Kuru) © and Colin S. Gordon® 


Drexel University, Philadelphia, USA 
{ik335 , csgordon}@drexel.edu 


Abstract. Memory management in lock-free data structures remains a 
major challenge in concurrent programming. Design techniques including 
read-copy-update (RCU) and hazard pointers provide workable solutions, 
and are widely used to great effect. These techniques rely on the concept 
of a grace period: nodes that should be freed are not deallocated imme- 
diately, and all threads obey a protocol to ensure that the deallocating 
thread can detect when all possible readers have completed their use of 
the object. This provides an approach to safe deallocation, but only when 
these subtle protocols are implemented correctly. 

We present a static type system to ensure correct use of RCU mem- 
ory management: that nodes removed from a data structure are always 
scheduled for subsequent deallocation, and that nodes are scheduled for 
deallocation at most once. As part of our soundness proof, we give an 
abstract semantics for RCU memory management primitives which cap- 
tures the fundamental properties of RCU. Our type system allows us to 
give the first proofs of memory safety for RCU linked list and binary 
search tree implementations without requiring full verification. 


1 Introduction 


For many workloads, lock-based synchronization — even fine-grained locking — has 
unsatisfactory performance. Often lock-free algorithms yield better performance, 
at the cost of more complex implementation and additional difficulty reasoning 
about the code. Much of this complexity is due to memory management: devel- 
opers must reason about not only other threads violating local assumptions, but 
whether other threads are finished accessing nodes to deallocate. At the time a 
node is unlinked from a data structure, an unknown number of additional threads 
may have already been using the node, having read a pointer to it before it was 
unlinked in the heap. 

A key insight for manageable solutions to this challenge is to recognize that 
just as in traditional garbage collection, the unlinked nodes need not be reclaimed 
immediately, but can instead be reclaimed later after some protocol finishes run- 
ning. Hazard pointers [29] are the classic example: all threads actively collaborate 
on bookkeeping data structures to track who is using a certain reference. For 
structures with read-biased workloads, Read-Copy-Update (RCU) [23] provides 
an appealing alternative. The programming style resembles a combination of 


© The Author(s) 2019 
L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 88-116, 2019. 
https: / /doi.org/10.1007/978-3-030-17184-1_4 


Safe Deferred Memory Reclamation with Types 89 


reader-writer locks and lock-free programming. Multiple concurrent readers per- 
form minimal bookkeeping — often nothing they wouldn’t already do. A single 
writer at a time runs in parallel with readers, performing additional work to track 
which readers may have observed a node they wish to deallocate. There are now 
RCU implementations of many common tree data structures [3,5,8, 19,24, 33], 
and RCU plays a key role in Linux kernel memory management [27]. 

However, RCU primitives remain non-trivial to use correctly: developers 
must ensure they release each node exactly once, from exactly one thread, 
after ensuring other threads are finished with the node in question. Model 
checking can be used to validate correctness of implementations for a mock 
client [1,7,17,21], but this does not guarantee correctness of arbitrary client 
code. Sophisticated verification logics can prove correctness of the RCU primi- 
tives and clients [12,15,22,32]. But these techniques require significant verifica- 
tion expertise to apply, and are specialized to individual data structures or imple- 
mentations. One important reason for the sophistication in these logics stems 
from the complexity of the underlying memory reclamation model. However, 
Meyer and Wolff [28] show that a suitable abstraction enables separating veri- 
fying correctness of concurrent data structures from its underlying reclamation 
model under the assumption of memory safety, and study proofs of correctness 
assuming memory safety. 

We propose a type system to ensure that RCU client code uses the RCU 
primitives safely, ensuring memory safety for concurrent data structures using 
RCU memory management. We do this in a general way, not assuming the client 
implements any specific data structure, only one satisfying some basic properties 
common to RCU data structures (such as having a tree memory footprint). In 
order to do this, we must also give a formal operational model of the RCU 
primitives that abstracts many implementations, without assuming a particular 
implementation of the RCU primitives. We describe our RCU semantics and type 
system, prove our type system sound against the model (which ensures memory 
is reclaimed correctly), and show the type system in action on two important 
RCU data structures. 

Our contributions include: 


— A general (abstract) operational model for RCU-based memory management 

— A type system that ensures code uses RCU memory management correctly, 
which is significantly simpler than full-blown verification logics 

— Demonstration of the type system on two examples: a linked-list based bag 
and a binary search tree 

— A proof that the type system guarantees memory safety when using RCU 
primitives. 


2 Background and Motivation 


In this section, we recall the general concepts of read-copy-update concurrency. 
We use the RCU linked-list-based bag [25] from Fig. 1 as a running example. It 
includes annotations for our type system, which will be explained in Sect. 4.2. 
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l struct BagNode{ 

2 int data; 

3 BagNode<rcultr> Next; 
4} 

5 BagNode<rcuRoot> head; 
6 void add(int toAdd){ 

7 WriteBegin; 

8 BagNode nw = new; 


l void remove(int toDel){ 

2 WriteBegin; 

3 {head: rcuRoot, par : undef, cur: undef} 

4 BagNode<rcultr> par,cur = head; 

5 {head: rcuRoot, par: rcultre{}, cur: rcultre{}} 
6 cur = par.Next; 

7 {cur: rcultrNext{}} 

8 {par: rcultre{ Next > cur}} 


9 {nw: rcuFresh{}} 9 while(cur.Next != null&&cur.data != toDel) 


Onw.data = toAdd; 10 4{ 

1 {head: rcuRoot, par: undef, cur: undef} 11 {cur: rcultr(Neat)*.Next{}} 

2 BagNode<rcuItr> par,cur = head; 7 mk : 

3 {head: rcuRoot, par: rcultre{}} E ipa: reültr(N ezt) (Werts eurt) 
3 par = cur; 

4 {cur: rcultre{}} 

e Si 14 cur = par.Next; 

ö cur = par.Next; k 

6 {cur: rcultrNext{}} 15 {cur: rcultr(Next)".Next.Next{}} 

7 {par: rcultre{ Next œ cur}} 16 {par: rcultr(Next)*.Next{Neaxt ++ cur}} 

8while(cur.Next != null){ 17} 


18 {nw: rcuFresh{}} 
19 {par: rcultr(Next)* {Next ++ cur}} 
20 {cur: rcultr(Next)*.Next{}} 
21 BagNode<rcultr> curl = cur.Next; 
23 {cur: rcultr(Next)*.Next.Next{}} 22 {cur: rcultr(Neat)*.Neat{ Next +> curl}} 
24 {par: rcultr(Next)*.Neaxt{Next ++ cur}} 23 {curl: rcultr(Next)*.Next.Next{}} 
25} 24 par.Next = curl; 
26 {nw: rcuFresh{ }} 25 {par: rcultr(Neaxt)* {Next 4 curl}} 
27 {cur: rcultr(Next)*.Next{Nezxt + null}} 26 {cur: unlinked} 
28 {par: rcultr(Next)* {Next > cur}} 27 {cur: rcultr(Next)*.Next{}} 
29 nw.Next= null; 28 SyncStart; 
30 {nw: rcuFresh{ Next +> null}} 29 SyncStop; 
31 {cur: rcultr(Neaxt)*.Next{Next 4 null}} 30 {cur: freeable} 
32 cur.Next=nw; 31 Free(cur) ; 
f myk p f 32 {cur: undef} 
33 {nw: rcultr( Next) Nert vba gan > null} } 33 WriteEnd; 
34 {cur: rcultr(Neaxt)".Next{Next œ> nw}} 34} 
35 WriteEnd; 
36} 


9 {cur: rcultr(Neat)*.Next{}} 

20 {par: rcultr(Next)* {Neat > cur}} 
21 par = cur; 

22 cur = par.Next; 


Fig. 1. RCU client: singly linked list based bag implementation. 


As with concrete RCU implementations, we assume threads operating on 
a structure are either performing read-only traversals of the structure—reader 
threads—or are performing an update—writer threads—similar to the use of 
many-reader single-writer reader-writer locks.! It differs, however, in that readers 
may execute concurrently with the (single) writer. 

This distinction, and some runtime bookkeeping associated with the read- 
and write-side critical sections, allow this model to determine at modest cost 
when a node unlinked by the writer can safely be reclaimed. 

Figure 1 gives the code for adding and removing nodes from a bag. Type 
checking for all code, including membership queries for bag, can be found in 
our technical report [20]. Algorithmically, this code is nearly the same as any 
sequential implementation. There are only two differences. First, the read-side 
critical section in member is indicated by the use of ReadBegin and ReadEnd; the 
write-side critical section is between WriteBegin and WriteEnd. Second, rather 
than immediately reclaiming the memory for the unlinked node, remove calls 


1? RCU implementations supporting multiple concurrent writers exist [3], but are the 
minority. 


Safe Deferred Memory Reclamation with Types 91 


SyncStart to begin a grace period—a wait for reader threads that may still hold 
references to unlinked nodes to finish their critical sections. SyncStop blocks 
execution of the writer thread until these readers exit their read critical section 
(via ReadEnd). These are the essential primitives for the implementation of an 
RCU data structure. 

These six primitives together track a critical piece of information: which 
reader threads’ critical sections overlapped the writer’s. Implementing them effi- 
ciently is challenging [8], but possible. The Linux kernel for example finds ways 
to reuse existing task switch mechanisms for this tracking, so readers incur no 
additional overhead. The reader primitives are semantically straightforward — 
they atomically record the start, or completion, of a read-side critical section. 

The more interesting primitives are the write-side primitives and memory 
reclamation. WriteBegin performs a (semantically) standard mutual exclusion 
with regard to other writers, so only one writer thread may modify the structure 
or the writer structures used for grace periods. 

SyncStart and SyncStop implement grace periods [31]: a mechanism to wait 
for readers to finish with any nodes the writer may have unlinked. A grace period 
begins when a writer requests one, and finishes when all reader threads active 
at the start of the grace period have finished their current critical section. Any 
nodes a writer unlinks before a grace period are physically unlinked, but not 
logically unlinked until after one grace period. 

An attentive reader might already realize that our usage of logical/physical 
unlinking is different than the one used in data-structures literature where typi- 
cally a logical deletion (e.g., marking) is followed by a physical deletion (unlink- 
ing). Because all threads are forbidden from holding an interior reference into the 
data structure after leaving their critical sections, waiting for active readers to 
finish their critical sections ensures they are no longer using any nodes the writer 
unlinked prior to the grace period. This makes actually freeing an unlinked node 
after a grace period safe. 

SyncStart conceptually takes a snapshot of all readers active when it is run. 
SyncStop then blocks until all those threads in the snapshot have finished at least 
one critical section. SyncStop does not wait for all readers to finish, and does not 
wait for all overlapping readers to simultaneously be out of critical sections. 

To date, every description of RCU semantics, most centered around the 
notion of a grace period, has been given algorithmically, as a specific (effi- 
cient) implementation. While the implementation aspects are essential to real 
use, the lack of an abstract characterization makes judging the correctness of 
these implementations — or clients — difficult in general. In Sect. 3 we give formal 
abstract, operational semantics for RCU implementations — inefficient if imple- 
mented directly, but correct from a memory-safety and programming model per- 
spective, and not tied to specific low-level RCU implementation details. To use 
these semantics or a concrete implementation correctly, client code must ensure: 


— Reader threads never modify the structure 
— No thread holds an interior pointer into the RCU structure across critical 
sections 
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— Unlinked nodes are always freed by the unlinking thread after the unlinking, 
after a grace period, and inside the critical section 
— Nodes are freed at most once 


In practice, RCU data structures typically ensure additional invariants to sim- 
plify the above, e.g.: 


— The data structure is always a tree 
— A writer thread unlinks or replaces only one node at a time. 


and our type system in Sect.4 guarantees these invariants. 


3 Semantics 


In this section, we outline the details of an abstract semantics for RCU imple- 
mentations. It captures the core client-visible semantics of most RCU primitives, 
but not the implementation details required for efficiency [27]. In our semantics, 
shown in Fig. 2, an abstract machine state, MState, contains: 


— A stack s, of type Var x TID — Loc 

— A heap, h, of type Loc x FName — Val 
— A lock, l, of type TID & {unlocked} 

— A root location rt of type Loc 

— A read set, R, of type P(TID) and 

— A bounding set, B, of type P(TID) 


The lock l enforces mutual exclusion between write-side critical sections. 
The root location rt is the root of an RCU data structure. We model only a 
single global RCU data structure; the generalization to multiple structures is 
straightforward but complicates formal development later in the paper. The 
reader set R tracks the thread IDs (TIDs) of all threads currently executing 
a read block. The bounding set B tracks which threads the writer is actively 
waiting for during a grace period—it is empty if the writer is not waiting. 

Figure 2 gives operational semantics for atomic actions; conditionals, loops, 
and sequencing all have standard semantics, and parallel composition uses 
sequentially-consistent interleaving semantics. 

The first few atomic actions, for writing and reading fields, assigning among 
local variables, and allocating new objects, are typical of formal semantics for 
heaps and mutable local variables. Free is similarly standard. A writer thread’s 
critical section is bounded by WriteBegin and WriteEnd, which acquire and release 
the lock that enforces mutual exclusion between writers. WriteBegin only reduces 
(acquires) if the lock is unlocked. 

Standard RCU APIs include a primitive synchronize_rcu() to wait for a 
grace period for the current readers. We decompose this here into two actions, 
SyncStart and SyncStop. SyncStart initializes the blocking set to the current set 
of readers—the threads that may have already observed any nodes the writer 
has unlinked. SyncStop blocks until the blocking set is emptied by completing 
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A 
a ::= skip | x.f = y | y =x | y = x.f | y = new | Free(x) | Sync Sync = SyncStart ; SyncStop 


(RCU-WBEGIN) [WriteBegin] (s, h, unlocked, rt, R, B) utia(s,h,l, rt, R, B) 


(RCU-WEND) [WriteEnd] (s, h,l, rt, R, B) Vria(s, h, unlocked, rt, R, B) 
(RCU-RBEGIN) [ReadBegin] (s, h, tid, rt, R, B) Vrials, h, tid, rt, RW {tid}, B) tid Al 
(RCU-REND) ReadEnd] (s,h, tid, rt, RW {tid}, B) uals, h, rt, R, B \ {tid}) tid £1 
(RCU-SSTART) [SyncStart] (s,h,l, rt, R, 0) Waals, h, Lrt R.R) 
(RCU-SSTOP) [SyncStop] (s, h, l, rt, R, 0) Vials, h, l, rt, R, Ø) 
(FREE) Free(x)] (s, h, l, rt, R, 0) Wtia(s, h’, l, rt, R, 0) 


provided Y; o- rt # s(x, tid) and o' # s(x, tid) => h(o', f) =h'(o’, f) and Vy. h’ (o, f) = undef 


(HUppt) [x.f=y] (s,h,l, rt, R, B)\tia(s, h[s(x, tid), f + s(y, tid)], l, rt, R, B) 
(HREAD) [y=x.f] (s, h,l, rt, R, B)\ua(s[(y, tid) > h(s(x, tid), f)], h, l, rt, R, B) 
(SUrDT)  [y=x] (s, h,l, rt, R, B)ua(sl(y, tid) + (zx, tid)], h, l, rt, R, B) 
(ALLOC) [y=new] (s, h, 1, rt, R, B)ẹuals, hle = nullmap], 1, rt, R, B) 


provided rt ¥ s(y, tid) and s[(y, tid) +> £], and 


h[£ +> nullmap] = A(o', f). if o = o' then skip else h(o’, f) 


Fig. 2. Operational semantics for RCU. 


reader threads. However, it does not wait for all readers to finish, and does not 
wait for all overlapping readers to simultaneously be out of critical sections. If 
two reader threads A and B overlap some SyncStart-SyncStop’s critical section, 
it is possible that A may exit and re-enter a read-side critical section before 
B exits, and vice versa. Implementations must distinguish subsequent read-side 
critical sections from earlier ones that overlapped the writer’s initial request to 
wait: since SyncStart is used after a node is physically removed from the data 
structure and readers may not retain RCU references across critical sections, A 
re-entering a fresh read-side critical section will not permit it to re-observe the 
node to be freed. 

Reader thread critical sections are bounded by ReadBegin and ReadEnd. 
ReadBegin simply records the current thread’s presence as an active reader. 
ReadEnd removes the current thread from the set of active readers, and also 
removes it (if present) from the blocking set—if a writer was waiting for a cer- 
tain reader to finish its critical section, this ensures the writer no longer waits 
once that reader has finished its current read-side critical section. 

Grace periods are implemented by the combination of ReadBegin, ReadEnd, 
SyncStart, and SyncStop. ReadBegin ensures the set of active readers is known. 
When a grace period is required, SyncStart ;SyncStop; will store (in B) the active 
readers (which may have observed nodes before they were unlinked), and wait 
for reader threads to record when they have completed their critical section (and 
implicitly, dropped any references to nodes the writer wants to free) via ReadEnd. 

These semantics do permit a reader in the blocking set to finish its read-side 
critical section and enter a new read-side critical section before the writer wakes. 
In this case, the writer waits only for the first critical section of that reader to 
complete, since entering the new critical section adds the thread’s ID back to R, 
but not B. 
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4 Type System and Programming Language 


In this section, we present a simple imperative programming language with two 
block constructs for modeling RCU, and a type system that ensures proper 
(memory-safe) use of the language. The type system ensures memory safety 
by enforcing these sufficient conditions: 


— A heap node can only be freed if it is no longer accessible from an RCU data 
structure or from local variables of other threads. To achieve this we ensure 
the reachability and access which can be suitably restricted. We explain how 
our types support a delayed ownership transfer for the deallocation. 

— Local variables may not point inside an RCU data structure unless they are 
inside an RCU read or write block. 

— Heap mutations are local: each unlinks or replaces exactly one node. 

— The RCU data structure remains a tree. While not a fundamental constraint 
of RCU, it is a common constraint across known RCU data structures because 
it simplifies reasoning (by developers or a type system) about when a node 
has become unreachable in the heap. 


We also demonstrate that the type system is not only sound, but useful: we 
show how it types Fig. 1’s list-based bag implementation [25]. We also give type 
checked fragments of a binary search tree to motivate advanced features of the 
type system; the full typing derivation can be found in our technical report [20] 
Appendix B. The BST requires type narrowing operations that refine a type 
based on dynamic checks (e.g., determining which of several fields links to a 
node). In our system, we presume all objects contain all fields, but the number 
of fields is finite (and in our examples, small). This avoids additional overhead 
from tracking well-established aspects of the type system—class and field types 
and presence, for example—and focus on checking correct use of RCU primitives. 
Essentially, we assume the code our type system applies to is already type-correct 
for a system like C or Java’s type system. 


4.1 RCU Type System for Write Critical Section 


Section 4.1 introduces RCU types and the need for subtyping. Section 4.2, shows 
how types describe program states, through code for Fig. 1’s list-based bag exam- 
ple. Section 4.3 introduces the type system itself. 


RCU Types. There are six types used in Write critical sections 


T ::= rcultr p N | rcuFresh M | unlinked | undef | freeable | rcuRoot 


rcultr is the type given to references pointing into a shared RCU data structure. 
A rcultr type can be used in either a write region or a read region (without 
the additional components). It indicates both that the reference points into the 
shared RCU data structure and that the heap location referenced by rcultr ref- 
erence is reachable by following the path p from the root. A component N is a 
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set of field mappings taking the field name to local variable names. Field maps 
are extended when the referent’s fields are read. The field map and path com- 
ponents track reachability from the root, and local reachability between nodes. 
These are used to ensure the structure remains acyclic, and for the type system 
to recognize exactly when unlinking can occur. 

Read-side critical sections use rcultr without path or field map components. 
These components are both unnecessary for readers (who perform no updates) 
and would be invalidated by writer threads anyways. Under the assumption 
that reader threads do not hold references across critical sections, the read- 
side rules essentially only ensure the reader performs no writes, so we omit the 
reader critical section type rules. They can be found in our technical report [20] 
Appendix E. 


unlinked is the type given to references to unlinked heap locations—objects 
previously part of the structure, but now unreachable via the heap. A heap 
location referenced by an unlinked reference may still be accessed by reader 
threads, which may have acquired their own references before the node became 
unreachable. Newly-arrived readers, however, will be unable to gain access to 
these referents. 


freeable is the type given to references to an unlinked heap location that is safe 
to reclaim because it is known that no concurrent readers hold references to it. 
Unlinked references become freeable after a writer has waited for a full grace 
period. 


undef is the type given to references where the content of the referenced location 
is inaccessible. A local variable of type freeable becomes undef after reclaiming 
that variable’s referent. 


rcuFresh is the type given to references to freshly allocated heap locations. 
Similar to rcultr type, it has field mappings set M. We set the field mappings 
in the set of an existing rcuFresh reference to be the same as field mappings in 
the set of rcultr reference when we replace the heap referenced by rcultr with the 
heap referenced by rcuFresh for memory safe replacement. 


rcuRoot is the type given to the fixed reference to the root of the RCU data 
structure. It may not be overwritten. 


Subtyping. It is sometimes necessary to use imprecise types—mostly for con- 
trol flow joins. Our type system performs these abstractions via subtyping on 
individual types and full contexts, as in Fig. 3. 

Figure 3 includes four judgments for subtyping. The first two—b+ N <: N” 
and F p <: p'—describe relaxations of field maps and paths respectively. 
HN <: N” is read as “the field map M is more precise than NV” and similarly 
for paths. The third judgment + T <: T’ uses path and field map subtyping to 
give subtyping among rcultr types—one rcultr is a subtype of another if its paths 
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N = {fol.--|fn — {y} | fi E FName ^n 0<t< na (y € VarVy E {null})} Neg =N\{f = -} 
No ={} N(Ussy) =NU{f >y} NA\s>y) =N -{f >u} 
Nf + yl) =N where f>yEN N(fr2e\yH=N\{f -2z} Uff y} 


(T-NSuB3) (T-NSus4) (T-NSuB5) 
FENS: N’ c ae NT 
2 E Njo <: Nf y) F Np Z: N ENN 
T-NSuB2 T-NSus1 
EMD < Nh l ) ENR =) = MAb = wD 
T-PSuB1) ————— T-PSuUB2) ———_—— T-PSuB3 
F p=: p ( ) F- p.fi <: p.filfe ( ) F p. f2 <: p. filf2 ( ) + p~<:p 
(T-TSuB1) 
(T-TSuB2) (T-TSuB) Hox: p LN 2: N' 
ETT" - - 
F reultr <: rcultr F reultr _ <: undef H reultr oN <: rcultr p’ N’ 


Frs FTT 
Fr,£x: T< I, x: T 


(T-CSuB1) (T-CSuB) z 


E e E i rsr 


Fig. 3. Subtyping rules. 


T(x) = bool TREOGAT 
T F while(x){C} 1T 


(T-REINDEX) 


DE Ce AD p.f"/p.f*-f] 


(T-Loop1) 
rae CAL 


I ,x:rcultrpN([fi > z]) F Ci Atl I ,x:rcultrpN([f2 — z]) F C2 4 T4 


(T-BRANCH1) : 
T x: rcultr o N([fi | fo — 2]) H if(x.fı == z) then Cı else Co 4 T4 


T, x: rcultr oN ([f — y \ null]) E Ci 4 T” T, x : rcultr oN ([f > yl) F C2 1 T” 


(T-BRANCH3) 7 
T, x : rcultr oN ([f = y]) F if(z.f == null) then C1 else Co 4 I 


T, x: rcultr p N([f = J) CAT, x : reultr f N([f = J) 
T, x: rcultr p N([f — -]) + while(w.f A null){C} ṣ x : rcultr p’ N([f — null]), T 


(T-Loop2) 


T(x) = bool rra rH OTI” 


(T-BRANCH2) ; 
I H if(a) then Ci else C2 4 I 


Fig. 4. Type rules for control-flow. 


and field maps are similarly more precise—and to allow rcultr references to be 
“forgotten”’—this is occasionally needed to satisfy non-interference checks in the 
type rules. The final judgment + I <: I” extends subtyping to all assumptions 
in a type context. 

It is often necessary to abstract the contents of field maps or paths, without 
simply forgetting the contents entirely. In a binary search tree, for example, 
it may be the case that one node is a child of another, but which parent field 
points to the child depends on which branch was followed in an earlier conditional 
(consider the lookup in a BST, which alternates between following left and right 
children). In Fig. 5, we see that cur aliases different fields of par — either Left or 
Right — in different branches of the conditional. The types after the conditional 
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must overapproximate this, here as Left|Right —> cur in par’s field map, and a 
similar path disjunction in cur’s path. This is reflected in Fig. 3’s T-NSuB1-5 
and T-PSuB1-2 — within each branch, each type is coerced to a supertype to 
validate the control flow join. 

Another type of control flow join is handling loop invariants — where paths 
entering the loop meet the back-edge from the end of a loop back to the start for 
repetition. Because our types include paths describing how they are reachable 
from the root, some abstraction is required to give loop invariants that work for 
any number of iterations — in a loop traversing a linked list, the iterator pointer 
would naively have different paths from the root on each iteration, so the exact 
path is not loop invariant. However, the paths explored by a loop are regular, 
so we can abstract the paths by permitting (implicitly) existentially quantified 
indexes on path fragments, which express the existence of some path, without 
saying which path. The use of an explicit abstract repetition allows the type 
system to preserve the fact that different references have common path prefixes, 
even after a loop. 

Assertions for the add function in lines 19 and 20 of Fig. 1 show the loop’s 
effects on paths of iterator references used inside the loop, cur and par. On line 
20, par’s path contains has (Nezt)*. The k in the (Nezt)* abstracts the number 
of loop iterations run, implicitly assumed to be non-negative. The trailing Neat 
in cur’s path on line 19 — (Nezxt)*.Next — expresses the relationship between 
cur and par: par is reachable from the root by following Next k times, and cur 
is reachable via one additional Next. The types of 19 and 20, however, are not 
the same as lines 23 and 24, so an additional adjustment is needed for the types 
to become loop-invariant. Reindexing (T-REINDEX in Fig. 4) effectively incre- 
ments an abstract loop counter, contracting (Next)*.Next to Nert! everywhere 
in a type environment. This expresses the same relationship between par and 
cur as before the loop, but the choice of k to make these paths accurate after 
each iteration would be one larger than the choice before. Reindexing the type 
environment of lines 23-24 yields the type environment of lines 19-20, making 
the types loop invariant. The reindexing essentially chooses a new value for the 
abstract k. This is sound, because the uses of framing in the heap mutation 
related rules of the type system ensure uses of any indexing variable are never 
separated — either all are reindexed, or none are. 

While abstraction is required to deal with control flow joins, reasoning about 
whether and which nodes are unlinked or replaced, and whether cycles are cre- 
ated, requires precision. Thus the type system also includes means (Fig. 4) to 
refine imprecise paths and field maps. In Fig.5, we see a conditional with the 
condition par.Left == cur. The type system matches this condition to the 
imprecise types in line 1’s typing assertion, and refines the initial type assump- 
tions in each branch accordingly (lines 2 and 7) based on whether execution 
reflects the truth or falsity of that check. Similarly, it is sometimes required 
to check — and later remember — whether a field is null, and the type system 
supports this. 


98 I. Kuru and C. S. Gordon 


1 {cur : rcultr Left|Right {}, par: rcultr e {Left|Right + cur}} 
2 if(par.Left == cur){ 

3 {cur : rcultr Left {}, par: rcultre {Left > cur}} 
4 par = cur; 
5 S 
6 


cur = par.Left; 
{cur : rcultr Left.Left {}, par: rcultr Left {Left + cur}} 
7 }else{ 
8 {cur : rcultr Right {}, par: rcultr e {Right + cur}} 
9 par = cur; 
10 cur = par.Right; 
11 {cur : rcultr Right.Right {}, par : rcultr Right {Right + cur}} 
12 } 


13 {cur : rcultr Left|Right.Left|Right {}, par: rcultr Left|Right {Left|Right +> cur}} 


Fig. 5. Choosing fields to read. 


4.2 Types in Action 


The system has three forms of typing judgement: I’ + C for standard typing 
outside RCU critical sections; lT Fr C 4 I” for reader critical sections, and 
I ty C4 I” for writer critical sections. The first two are straightforward, 
essentially preventing mutation of the data structure, and preventing nesting 
of a writer critical section inside a reader critical section. The last, for writer 
critical sections, is flow sensitive: the types of variables may differ before and after 
program statements. This is required in order to reason about local assumptions 
at different points in the program, such as recognizing that a certain action may 
unlink a node. Our presentation here focuses exclusively on the judgment for the 
write-side critical sections. 

Below, we explain our types through the list-based bag implementation [25] 
from Fig. 1, highlighting how the type rules handle different parts of the code. 
Figure | is annotated with “assertions” — local type environments — in the style 
of a Hoare logic proof outline. As with Hoare proof outlines, these annotations 
can be used to construct a proper typing derivation. 


Reading a Global RCU Root. All RCU data structures have fixed roots, which 
we characterize with the rcuRoot type. Each operation in Fig. 1 begins by reading 
the root into a new rcultr reference used to begin traversing the structure. After 
each initial read (line 12 of add and line 4 of remove), the path of cur reference 
is the empty path (e€) and the field map is empty ({}), because it is an alias to 
the root, and none of its field contents are known yet. 


Reading an Object Field and a Variable. As expected, we explore the heap 
of the data structure via reading the objects’ fields. Consider line 6 of remove 
and its corresponding pre- and post- type environments. Initially par’s field map 
is empty. After the field read, its field map is updated to reflect that its Next 
field is aliased in the local variable cur. Likewise, after the update, cur’s path 
is Next (= e- Next), extending the par node’s path by the field read. This 
introduces field aliasing information that can subsequently be used to reason 
about unlinking. 


Unlinking Nodes. Line 24 of remove in Fig. 1 unlinks a node. The type annota- 
tions show that before that line cur is in the structure (rcultr), while afterwards 
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its type is unlinked. The type system checks that this unlink disconnects only 
one node: note how the types of par, cur, and curl just before line 24 completely 
describe a section of the list. 


Grace and Reclamation. After the referent of cur is unlinked, concurrent 
readers traversing the list may still hold references. So it is not safe to actually 
reclaim the memory until after a grace period. Lines 28-29 of remove initiate a 
grace period and wait for its completion. At the type level, this is reflected by the 
change of cur’s type from unlinked to freeable, reflecting the fact that the grace 
period extends until any reader critical sections that might have observed the 
node in the structure have completed. This matches the precondition required by 
our rules for calling Free, which further changes the type of cur to undef reflecting 
that cur is no longer a valid reference. The type system also ensures no local 
(writer) aliases exist to the freed node and understanding this enforcement is 
twofold. First, the type system requires that only unlinked heap nodes can be 
freed. Second, framing relations in rules related to the heap mutation ensure no 
local aliases still consider the node linked. 


Fresh Nodes. Some code must also allocate new nodes, and the type system 
must reason about how they are incorporated into the shared data structure. 
Line 8 of the add method allocates a new node nw, and lines 10 and 29 initialize 
its fields. The type system gives it a fresh type while tracking its field contents, 
until line 32 inserts it into the data structure. The type system checks that nodes 
previously reachable from cur remain reachable: note the field maps of cur and 
nw in lines 30-31 are equal (trivially, though in general the field need not be 
null). 


4.3 Type Rules 


Figure 6 gives the primary type rules used in checking write-side critical section 
code as in Fig. 1. 

T-ROOT reads a root pointer into an rcultr reference, and T-READS copies a 
local variable into another. In both cases, the free variable condition ensures that 
updating the modified variable does not invalidate field maps of other variables 
in I’. These free variable conditions recur throughout the type system, and we 
will not comment on them further. T-ALLOC and T-FREE allocate and reclaim 
objects. These rules are relatively straightforward. T-READH reads a field into 
a local variable. As suggested earlier, this rule updates the post-environment to 
reflect that the overwritten variable z holds the same value as xz. f. T-WRITEFH 
updates a field of a fresh (thread-local) object, similarly tracking the update in 
the fresh object’s field map at the type level. The remaining rules are a bit more 
involved, and form the heart of the type system. 


Grace Periods. T-SYNC gives pre- and post-environments to the compound 
statement SyncStart ;SyncStop implementing grace periods. As mentioned earlier, 
this updates the environment afterwards to reflect that any nodes unlinked before 
the wait become freeable afterwards. 
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y g FV(T) 
7] (T-Root) 
Preyer T, r:rcuRoot, y:undef F y = r A y:rcultreNg, r:rcuRoot, I" 
z €FV(L) 
T-READS 
( ) T,z:.,e:rcultr p NE z=a4a:rcultr p N,z:rcultr p N, T 


(T-ALLOC) (T-FREE) 


I, x:undef H x = new 4 x:rcuFreshNg, I x:freeable + Free(x) 4 x:undef 


pf=p  2¢€FV(L) 
T, z : -,a:rcultrpN H z = a. f 4 asrcultrpN([f — 2]), z:rcultrp’No, T 


(T-READH) 


(T-WRITEFH) 
z:reultrp.f. N(f) =z f ¢dom(N’) 
T, p:rcuFreshN’, x:rcultrpN Fm p.f = z 4 pircuFreshN’ ([f — z]), z:rcultroN ([f — z]), r 


(T-SyNc) 


I’ F SyncStart; SyncStop 4 I'[x:freeable/a:unlinked] 


(T-UNLINKH) 
N(fi) =z p-fi = pr pi-fo =p 
=N([fi > z\r)) Vecdom(Ny): f # fo => (NACE) = null) N(fi) =z Ni(f2) =r 


Ynermain.f- Pics E —e | KeMavAlianins fo a 2.7) 


T, x:rcultroN , z:rcultrp, Nj, rircultroo N2 H x. fı = r 4 z:unlinked, a:rcultrpN” , r:rcultrp; No, I 


(T-REPLACE) 

Nif)=o N=N(fso\nl) pf=p M=No  FV(D)N{p,o,n}=0 
Yzer, N3,p2, fiy: (areultr p2 N3([fi = y])) => (-MayAlias(p2, {p, p1}) A (y ¥ 0)) 

T, p:rcultrpN, o:rcultrp1.N1, n:rcuFreshN2 H p.f = n + p:rcultro N” , n:rcultrp;N2, o:unlinked, T 


(T-INSERT) 
=N([f + o \n]) p-f = p1 pı- fa = p2 
N(f) = Mı (fa) Vfygedom(N1): fa É f2 => Mi (f2) = null FV(T) N {p,0,n} = 0 
Vzer, N3 ,p3,f1 u: (2 : rcultr p3 Na ([f1 > y])) => (Vog%e- 7MayAlias(ps, p.pa)) 
T, p:rcultroN , o:rcultrp;.N2, nircuFreshNy F p.f =n p:rcultropN”’, n:rcultroı M1, o:rcultroa N2, T 


NoFresh( T”) NoUnlinked(I’’) NoFreeable(I”’ ) 
T, y:rcultr. Fm C AT’ FType(f) = RCU 


I+ RCUWrite x. f as y in {C} 


Pe er (TORCUWRITE) 
M 


Fig. 6. Type rules for write side critical section. 


Unlinking. T-UNLINKH type checks heap updates that remove a node from 
the data structure. The rule assumes three objects x, z, and r, whose identities 
we will conflate with the local variable names in the type rule. The rule checks 
the case where x. fı == z and z. f2 == r initially (reflected in the path and field 
map components, and a write x.fı = r removes z from the data structure (we 
assume, and ensure, the structure is a tree). 

The rule must also avoid unlinking multiple nodes: this is the purpose of the 
first (smaller) implication: it ensures that beyond the reference from z to r, all 
fields of z are null. 

Finally, the rule must ensure that no types in I are invalidated. This could 
happen one of two ways: either a field map in J’ for an alias of x duplicates 
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(b) Safe replacement of the heap 
(a) Freshly allocated heap node ref- node referenced by cr with the fresh 
erenced by cf heap node referenced by cf. 


Fig. 7. Replacing existing heap nodes with fresh ones. Type rule T-REPLACE. 


the assumption that z.f; == z (which is changed by this write), or I’ contains 
a descendant of r, whose path from the root will change when its ancestor is 
modified. The final assumption of T-UNLINKH (the implication) checks that for 
every rcultr reference n in J’, it is not a path alias of x, z, or r; no entry of its field 
map (m) refers to r or z (which would imply n aliased x or z initially); and its 
path is not an extension of r (i.e., it is not a descendant). MayAlias is a predicate 
on two paths (or a path and set of paths) which is true if it is possible that any 
concrete paths the arguments may abstract (e.g., via adding non-determinism 
through|or abstracting iteration with indexing) could be the same. The negation 
of a MayAlias use is true only when the paths are guaranteed to refer to different 
locations in the heap. 


Replacing with a Fresh Node. Replacing with a rcuFresh reference faces the 
same aliasing complications as direct unlinking. We illustrate these challenges 
in Figs. 7a and b. Our technical report [20] also includes Figures 32a and 32b in 
Appendix D to illustrate complexities in unlinking. The square R nodes are root 
nodes, and H nodes are general heap nodes. All resources in thick straight lines 
and dotted lines form the memory foot print of a node replacement. The hollow 
thick circular nodes — pr and cr — point to the nodes involved in replacing Hı 
(referenced by cr) with Hy (referenced by cf) in the structure. We may have ao 
and a; which are aliases with pr and cr respectively. They are path-aliases as 
they share the same path from root to the node that they reference. Edge labels 
l and r are abbreviations for the Left and Right fields of a binary search tree. 
The thick dotted Hy denotes the freshly allocated heap node referenced by thick 
dotted cf. The thick dotted field J is set to point to the referent of cl and the 
thick dotted field r is set to point to the referent of the heap node referenced 
by Im. 

Hy initially (Fig. 7a) is not part of the shared structure. If it was, it would 
violate the tree shape requirement imposed by the type system. This is why we 
highlight it separately in thick dots—its static type would be rcuFresh. Note that 
we cannot duplicate a rcuFresh variable, nor read a field of an object it points 
to. This restriction localizes our reasoning about the effects of replacing with 
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a fresh node to just one fresh reference and the object it points to. Otherwise 
another mechanism would be required to ensure that once a fresh reference was 
linked into the heap, there were no aliases still typed as fresh—since that would 
have risked linking the same reference into the heap in two locations. 

The transition from the Fig. 7a to b illustrates the effects of the heap mutation 
(replacing with a fresh node). The reasoning in the type system for replacing 
with a fresh node is nearly the same as for unlinking an existing node, with one 
exception. In replacing with a fresh node, there is no need to consider the paths of 
nodes deeper in the tree than the point of mutation. In the unlinking case, those 
nodes’ static paths would become invalid. In the case of replacing with a fresh 
node, those descendants’ paths are preserved. Our type rule for ensuring safe 
replacement (T-REPLACE) prevents path aliasing (representing the nonexistence 
of ap and a, via dashed lines and circles) by negating a MayAlias query and 
prevents field mapping aliasing (nonexistence of any object field from any other 
context pointing to cr) via asserting (y 4 o). It is important to note that objects 
(H4, H2) in the field mappings of the cr whose referent is to be unlinked captured 
by the heap node’s field mappings referenced by cf in rcuFresh. This is part of 
enforcing locality on the heap mutation and captured by assertion M = N” in 
the type rule (T-REPLACE). 


Inserting a Fresh Node. T-INSERT type checks heap updates that link a fresh 
node into a linked data structure. Inserting a rcuFresh reference also faces some 
of the aliasing complications that we have already discussed for direct unlinking 
and replacing a node. Unlike the replacement case, the path to the last heap 
node (the referent of o) from the root is extended by f, which risks falsifying the 
paths for aliases and descendants of o. The final assumption (the implication) of 
T-INSERT checks for this inconsistency. 

There is also another rule, T-LINKF-NULL, not shown in Fig. 6, which han- 
dles the case where the fields of the fresh node are not object references, but 
instead all contain null (e.g., for appending to the end of a linked list or inserting 
a leaf node in a tree). 


Critical Sections (Referencing inside RCU Blocks). We introduce the 
syntactic sugaring RCUWrite x.f as y in {C} for write-side critical sections 
where the analogous syntactic sugaring can be found for read-side critical sec- 
tions in Appendix E of the technical report [20]. 

The type system ensures unlinked and freeable references are handled linearly, 
as they cannot be dropped — coerced to undef. The top-level rule TORCU WRITE 
in Fig. 6 ensures unlinked references have been freed by forbidding them in the 
critical section’s post-type environment. Our technical report [20] also includes 
the analogous rule TORCUREAD for the read critical section in Figure 33 of 
Appendix E. 

Preventing the reuse of rcultr references across critical sections is subtler: 
the non-critical section system is not flow-sensitive, and does not include rcultr. 
Therefore, the initial environment lacks rcultr references, and trailing rcultr ref- 
erences may not escape. 
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5 Evaluation 


We have used our type system to check correct use of RCU primitives in two 
RCU data structures representative of the broader space. 

Figure 1 gives the type-annotated code for add and remove operations on a 
linked list implementation of a bag data structure, following McKenney’s exam- 
ple [25]. Our technical report [20] contains code for membership checking. 

We have also type checked the most challenging part of an RCU binary search 
tree, the deletion (which also contains the code for a lookup). Our implemen- 
tation is a slightly simplified version of the Citrus BST [3]: their code supports 
fine-grained locking for multiple writers, while ours supports only one writer by 
virtue of using our single-writer primitives. For lack of space the annotated code 
is only in Appendix B of the technical report [20], but here we emphasise the 
important aspects our type system via showing its capabilities of typing BST 
delete method, which also includes looking up for the node to be deleted. 

In Fig.8, we show the steps for deleting the heap node Hı. To locate the 
node Hj, as shown in Fig. 8a, we first traverse the subtree To with references pr 
and cr, where pr is the parent of cr during traversal: 


pr: rcultr(I|r)* {I|r > cr}, cr : reultr(I|r)*.(I|r){} 


Traversal of Tp is summarized as (I|k)*. The most subtle aspect of the deletion 
is the final step in the case the node H; to remove has both children; as shown 
in Fig. 8b, the code must traverse the subtree T; to locate the next element in 
collection order: the node Hs, the left-most node of H3’s right child (sc) and its 
parent (Ip): 


lp : (Ur)*.(i|r).r.(r) {llr sc}, se: (U|r)*.(U|r).r.L.()™ 1} 


where the traversal of T4 is summarized as (I|m)™. 

Then H, is copied into a new freshly-allocated node as shown in Fig. 8b, which 
is then used to replace node Hı as shown in Fig. 8c: the replacement’s fields 
exactly match H,’s except for the data (T-REPLACE via Mı = M2) as shown in 
Fig. 8b, and the parent is updated to reference the replacement, unlinking A. 

At this point, as shown in Figs. 8c and d, there are two nodes with the 
same value in the tree (the weak BST property of the Citrus BST [3]): the 
replacement node, and what was the left-most node under H3’s right child. 
This latter (original) node H, must be unlinked as shown in Fig. 8e, which is 
simpler because by being left-most the left child is null, avoiding another round 
of replacement (T-UNLINKH via Vyedomin,)- f A f2 => (Mi(f) = null). 

Traversing T; to find successor complicates the reasoning in an interesting 
way. After the successor node H, is found in Fig. 8b, there are two local unlinking 
operations as shown in Figs. 8c and e, at different depths of the tree. This is why 
the type system must keep separate abstract iteration counts, e.g., k of (I|r)* 
or m of (I|r)™, for traversals in loops—these indices act like multiple cursors 
into the data structure, and allow the types to carry enough information to keep 
those changes separate and ensure neither introduces a cycle. 
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(b) Traverse subtree T4 starting 
from Hə with references lp and sc 
(a) The writer traverses subtree Ty) to find successor Hs of Hı. Dupli- 
to find the heap node Hı with local cating H, as a fresh heap node be- 
references pr and cr. Black-filled fore replacing Hı with the fresh 


node representing the null node. one. 

R pr R pr 

T To cf 
Hs 

Tə H2 

lp Ta T 

SC H, Te 

Te 


(c) Replace Hı with fresh successor (d) Unlinks old successor referenced 


and synchronize with the readers. by sc. 
R pr 
T To cf R pr 
E To cf 
H, 
Tə Hə 
lp Ta T 
Te T 


(e) Safe unlinking of the old succes- (f) Reclamation of the old succes- 
sor whose left subtree is null. sor. 


Fig. 8. Delete of a heap node with two children in BST [3]. 


To the best of our knowledge, we are the first to check such code for memory- 
safe use of RCU primitives modularly, without appeal to the specific implemen- 
tation of RCU primitives. 
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6 Soundness 


This section outlines the proof of type soundness — our full proof appears the 
accompanying technical report [20]. We prove type soundness by embedding the 
type system into an abstract concurrent separation logic called the Views Frame- 
work [9], which when given certain information about proofs for a specific lan- 
guage (primitives and primitive typing) gives back a full program logic including 
choice and iteration. As with other work taking this approach [13,14], this con- 
sists of several key steps explained in the following subsections, but a high-level 
informal soundness argument is twofold. First, because the parameters given to 
the Views framework ensure the Views logic’s Hoare triples {—}C'{—} are sound, 
this proves soundness of the type rules with respect to type denotations. Second, 
as our denotation of types encodes the property that the post-environment of 
any type rule accurately characterizes which memory is linked vs. unlinked, etc., 
and the global invariants ensure all allocated heap memory is reachable from 
the root or from some thread’s stack, this entails that our type system prevents 
memory leaks. 


6.1 Proof 


This section provides more details on how the Views Framework [9] is used to 
prove soundness, giving the major parameters to the framework and outlining 
global invariants and key lemmas. 


Logical State. Section 3 defined what Views calls atomic actions (the primitive 
operations) and their semantics on runtime machine states. The Views Frame- 
work uses a separate notion of instrumented (logical) state over which the logic 
is built, related by a concretization function |—| taking an instrumented state 
to the machine states of Sect. 3. Most often—including in our proof—the logical 
state adds useful auxiliary state to the machine state, and the concretization is 
simply projection. Thus we define our logical states LState as: 

— A machine state, o = (s,h,l,rt, R, B) 

— An observation map, O, of type Loc — P(obs) 

— Undefined variable map, U, of type P(Var x TID) 

— Set of threads, T, of type P(TIDS) 

— A to-free map (or free list), F, of type Loc + P(TID) 


The thread ID set T includes the thread ID of all running threads. The free map 
F tracks which reader threads may hold references to each location. It is not 
required for execution of code, and for validating an implementation could be 
ignored, but we use it later with our type system to help prove that memory 
deallocation is safe. The (per-thread) variables in the undefined variable map U 
are those that should not be accessed (e.g., dangling pointers). 

The remaining component, the observation map O, requires some further 
explanation. Each memory allocation/object can be observed in one of the fol- 
lowing states by a variety of threads, depending on how it was used. 


obs := iterator tid | unlinked | fresh | freeable | root 
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An object can be observed as part of the structure (iterator), removed but 
possibly accessible to other threads, freshly allocated, safe to deallocate, or the 
root of the structure. 


Invariants of RCU Views and Denotations of Types. Next, we aim to con- 
vey the intuition behind the predicate WellFormed which enforces global invari- 
ants on logical states, and how it interacts with the denotations of types (Fig. 9) 
in key ways. 

WellFormed is the conjunction of a number of more specific invariants, which 
we outline here. For full details, see Appendix A.2 of the technical report [20]. 


The Invariant for Read Traversal. Reader threads access valid heap locations 
even during the grace period. The validity of their heap accesses ensured by 
the observations they make over the heap locations—which can only be iterator 
as they can only use local rcultr references. To this end, a Readers-Iterators-Only 
invariant asserts that reader threads can only observe a heap location as iterator. 


Invariants on Grace-Period. Our logical state includes a “free list” auxiliary 
state tracking which readers are still accessing each unlinked node during grace 
periods. This must be consistent with the bounding thread set B in the machine 
state, and this consistency is asserted by the Readers-In-Free-List invariant. This 
is essentially tracking which readers are being “shown grace” for each location. 
The Iterators-Free-List invariant complements this by asserting all readers with 
such observations on unlinked nodes are in the bounding thread set. 

The writer thread can refer to a heap location in the free list with a local 
reference either in type freeable or unlinked. Once the writer unlinks a heap 
node, it first observes the heap node as unlinked then freeable. The denotation of 
freeable is only valid following a grace period: it asserts no readers hold aliases 
of the freeable reference. The denotation of unlinked permits the either the same 
(perhaps no readers overlapped) or that it is in the to-free list. 


Invariants on Safe Traversal Against Unlinking. The write-side critical section 
must guarantee that no updates to the heap cause invalid memory accesses. The 
Writer-Unlink invariant asserts that a heap location observed as iterator by the 
writer thread cannot be observed differently by other threads. The denotation of 
the writer thread’s rcultr reference, [rcultr oN ]i:a, asserts that following a path 
from the root compatible with p reaches the referent, and all are observed as 
iterator. 

The denotation of a reader thread’s rcultr reference, [rcultr];;q and the invari- 
ants Readers-Iterator-Only, Iterators-Free-List and Readers-In-Free-List all together 
assert that a reader thread (which can also be a bounding thread) can view an 
unlinked heap location (which can be in the free list) only as iterator. At the 
same time, it is essential that reader threads arriving after a node is unlinked 
cannot access it. The invariants Unlinked-Reachability and Free-List-Reachability 
ensure that any unlinked nodes are reachable only from other unlinked nodes, 
and never from the root. 
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m E M|(iterator tid E€ O(s(x, tid))) A (a ¢ U) 

s(xi, tid) = h(s(x, tid), fi) 
AV fi edom(N) xj €codom(N): { Aiterator € O(s(x;, tid))) 
AY pt pit. PpP” =p => iterator tid € O(h* (rt, p’))) 
Ah* (rt, p) = s(x, tid) A (l = tid A s(x, -) ¢ dom(F))) 


x: rcultroN Jtia = 


x: rcultrJtia = (Arica: {s(x, tid) + T'} A F AO)A 


m E M|(iterator tid E€ O(s(x, tid))) A (a € U)A 
(tid E€ B) => \ 


A(tid € T’) 
m € M|(unlinked € O(.s(z, tid)) A l = tid Ax ¢ U)A 


z : unlinked Jeg = | Bro: s(a, tid) > T € F => T!/CBAtdET’) \ 


freeable € O(s(x, tid)) Al =tidA x € UA 
MEM s(x, tid) => {0} € F 
m E M|(fresh € O(s(x, tid)) Ax ¢ U A s(x, tid) ¢ dom(F)) } 


x : freeable J tia = 
z : reuFresh N Jue = (Vf edem(N),aiceodom( N): (2i, tid) = h(s(x, tid), fi) 
Aiterator tid € O(s(x;, tid)) A s(x;, tid) ¢ dom(F)) 
m E M|(x, tid) E U A s(x, tid) ¢ dom(F) } 

(rt €U A s(x, tid) = rt Art € dom(h)A 
O(rt) € root A s(x, tid) ¢ dom(F)) 


x : undef] iia = 


meM 


x : rcuRoot] ia = 
provided h* : (Loc x Path) — Val 


Fig. 9. Type environments 


Invariants on Safe Traversal Against Inserting/Replacing. A writer replacing an 
existing node with a fresh one or inserting a single fresh node assumes the fresh 
(before insertion) node is unreachable to readers before it is published/linked. 
The Fresh-Writes invariant asserts that a fresh heap location can only be allocated 
and referenced by the writer thread. The relation between a freshly allocated 
heap and the rest of the heap is established by the Fresh-Reachable invariant, 
which requires that there exists no heap node pointing to the freshly allocated 
one. This invariant supports the preservation of the tree structure. The Fresh- 
Not-Reader invariant supports the safe traversal of the reader threads via assert- 
ing that they cannot observe a heap location as fresh. Moreover, the denotation 
of the rcuFresh type, [rcuFresh N ]¢ia, enforces that fields in M point to valid heap 
locations (observed as iterator by the writer thread). 


Invariants on Tree Structure. Our invariants enforce the tree structure heap 
layouts for data structures. The Unique-Reachable invariant asserts that every 
heap location reachable from root can only be reached with following an unique 
path. To preserve the tree structure, Unique-Root enforces unreachability of the 
root from any heap location that is reachable from root itself. 


Type Environments. Assertions in the Views logic are (almost) sets of the 
logical states that satisfy a validity predicate WellFormed, outlined above: 


Me {m € (MState x O x U x T x F) | WellFormed(m)} 


Every type environment represents a set of possible views (WellFormed logical 
states) consistent with the types in the environment. We make this precise with 
a denotation function 


[—]_ : TypeEnv > TID — P(M) 
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oe (e,,00,U,U) (Fi er F2) = Fı U Fz when dom(F:1) N dom(F2) = 0 


Oı eo O2(loc) = Oı(loc) U Oz(loc) (sı @s s2) =Z sı U s2 when dom(s1) N dom(s2) = Ø 


undef if hı (0, f) = v A h2(0, f) =v Av’ £v 
aa |a if hı (0, f) = v A ha(0, f) =v 
(hi en h2)(0, f) = v if hi(o, f) = undef A h2(o, f) = v 
v if hi(o, f) =v A h2(0, f) = undef 
undef if hi(o, f) = undef A ha(o, f) = undef 
((s, h, l, rt, R, B), O, U, T, FYRo((s',h’,U rt’, R', B’),O',U',T', F') = 
LET >(h=h AL=V) 
leET> F= F' 
Vtid, o. iterator tid € O(0) + o € dom(h) 
A Vtid, o. iterator tid € O(0) + o € dom(h’) 
Vtid, o. root tid E€ O(0) —> o E€ dom(h) 
Vtid, o. root tid € O(0) — o € dom(h’) 
O=OAAU=SUWU ATHT'AR=R' Art =rt’ 
Va,t € T. s(x, t) = s' (x,t) 


Fig. 10. Composition (e) and Thread Interference Relation (Ro) 


that yields the set of states corresponding to a given type environment. This is 
defined as the intersection of individual variables’ types as in Fig. 9. 

Individual variables’ denotations are extended to context denotations slightly 
differently depending on whether the environment is a reader or writer thread 
context: writer threads own the global lock, while readers do not: 


— For read-side as [71 : T1,...¢n : Trea = [21 : Tilia A -O [en : Tr)tia O 
(Rica where [R]zia = {(s,h,l, rt, R, B),O,U,T, F | tid € R} 

— For write-side as [x1 : T1,- .. 8n : TrJeiam = [21 : Tia... [en : Tretia 
(Ml zia where [M]iia = {(5,h,1,rt, R, B),O,U,T, F | tid = 1} 


Composition and Interference. To support framing (weakening), the Views 
Framework requires that views form a partial commutative monoid under an 
operation è : M —» M — M, provided as a parameter to the framework. The 
framework also requires an interference relation R C M x M between views 
to reason about local updates to one view preserving validity of adjacent views 
(akin to the small-footprint property of separation logic). Figure 10 defines our 
composition operator and the core interference relation Ro—the actual interfer- 
ence between views (between threads, or between a local action and framed-away 
state) is the reflexive transitive closure of Ro. Composition is mostly straightfor- 
ward point-wise union (threads’ views may overlap) of each component. Inter- 
ference bounds the interference writers and readers may inflict on each other. 
Notably, if a view contains the writer thread, other threads may not modify the 
shared portion of the heap, or release the writer lock. Other aspects of interfer- 
ence are natural restrictions like that threads may not modify each others’ local 
variables. WellFormed states are closed under both composition (with another 
WellFormed state) and interference (R relates WellFormed states only to other 
WellFormed states). 
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Lif (x.f == y) Ci Co | tid T z = x. f; ((assume(z = y); C1) + (assume(z Æ y); C2)) 


def ssifsesS . def % 
Jassume(S)](s) = { i } Otherwise { while (e) C | = (assume(e); C’)* ; (assume(—e)); 

{P} N {[S]} E {Q} _ 

“{Pyassume (S) {Q} where [S] = {m||m] NS #0} 


Fig. 11. Encoding branch conditions with assume(b) 


Stable Environment and Views Shift. The framing/weakening type rule will 
be translated to a use of the frame rule in the Views Framework’s logic. There 
separating conjunction is simply the existence of two composable instrumented 
states: 


d 
mePx*Q Ef Sm! Imm EPAM © QAmem em" 


In order to validate the frame rule in the Views Framework’s logic, the assertions 
in its logic—sets of well-formed instrumented states—must be restricted to sets 
of logical states that are stable with respect to expected interference from other 
threads or contexts, and interference must be compatible in some way with 
separating conjunction. Thus a View—the actual base assertions in the Views 
logic—are then: 


View  {M € P(M)|R(M) € M} 


Additionally, interference must distribute over composition: 


Ymi, M2, M. (Mı © M2)Rm => Amims.mi Rm), A MRM, Am E mi, ems 


Because we use this induced Views logic to prove soundness of our type 
system by translation, we must ensure any type environment denotes a valid 
view: 


Lemma 1 (Stable Environment Denotation-M). For any closed environ- 
ment I (i.e., Yx € dom(T).,FV(T(x)) C dom(T)): R(L]matia) © LE] uta: 


Alternatively, we say that environment denotation is stable (closed under R). 
Proof. In Appendix A.1 Lemma 7 of the technical report [20]. 


We elide the statement of the analogous result for the read-side critical section, 
available in Appendix A.1 of the technical report. 

With this setup done, we can state the connection between the Views Frame- 
work logic induced by earlier parameters, and the type system from Sect. 4. The 
induced Views logic has a familiar notion of Hoare triple—{p}C{q} where p and 
q are elements of View,,—with the usual rules for non-deterministic choice, non- 
deterministic iteration, sequential composition, and parallel composition, sound 
given the proof obligations just described above. It is parameterized by a rule 
for atomic commands that requires a specification of the triples for primitive 
operations, and their soundness (an obligation we must prove). This can then be 
used to prove that every typing derivation embeds to a valid derivation in the 
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Views Logic, roughly VI, C, I”, tid. FP CAI” => {[L] ua [C]ua{ Jia} once 
for the writer type system, once for the readers. 

There are two remaining subtleties to address. First, commands C also 
require translation: the Views Framework has only non-deterministic branches 
and loops, so the standard versions from our core language must be encoded. 
The approach to this is based on a standard idea in verification, which we show 
here for conditionals as shown in Fig. 11. assume(b) is a standard idea in verifica- 
tion semantics [4,30], which “does nothing” (freezes) if the condition b is false, so 
its postcondition in the Views logic can reflect the truth of b. assume in Fig. 11 
adapts this for the Views Framework as in other Views-based proofs [13,14], 
specifying sets of machine states as a predicate. We write boolean expressions 
as shorthand for the set of machine states making that expression true. With 
this setup done, the top-level soundness claim then requires proving — once for 
the reader type system, once for the writer type system — that every valid 
source typing derivation corresponds to a valid derivation in the Views logic: 
YI,C, I”, Ply CAlr’ => {Il} 1 Cl {J}. 

Second, we have not addressed a way to encode subtyping. One might hope 
this corresponds to a kind of implication, and therefore subtyping corresponds to 
consequence. Indeed, this is how we (and prior work [13,14]) address subtyping 
in a Views-based proof. Views defines the notion of view shift? (T) as a way to 
reinterpret a set of instrumented states as a new (compatible) set of instrumented 
states, offering a kind of logical consequence, used in a rule of consequence in 
the Views logic: 


pEq Yme M. [px {m}] C lgx R({m})] 


We are now finally ready to prove the key lemmas of the soundness proof, 
relating subtying to view shifts, proving soundness of the primitive actions, and 
finally for the full type system. These proofs occur once for the writer type 
system, and once for the reader; we show here only the (more complex) writer 
obligations: 


Lemma 2 (Axiom of Soundness for Atomic Commands). For each 
aziom, Ty Fm ails, we show Ym. [a] (| Milia * {m}]) © (Molia * R({m}) | 


Proof. By case analysis on a. Details in Appendix A.1 of the technical report [20]. 


Lemma 3 (Context-SubTyping-M). I <: I” = > [|I]m,tia E "| tia 


Proof. Induction on the subtyping derivation, then inducting on the single-type 
subtype relation for the first variable in the non-empty context case. 


Lemma 4 (Views Embedding for Write-Side). 
VIC,I",t.Cbum CAl’ > (li Mh A (Ch 4 eA IM: 


? This is the same notion present in later program logics like Iris [18], though more 
recent variants are more powerful. 
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Proof. By induction on the typing derivation, appealing to Lemma 2 for primi- 
tives, Lemma 3 and consequence for subtyping, and otherwise appealing to struc- 
tural rules of the Views logic and inductive hypotheses. Full details in Appendix 
A.1 of the technical report [20]. 


The corresponding obligations and proofs for the read-side critical section 
type system are similar in statement and proof approach, just for the read-side 
type judgments and environment denotations. 


7 Discussion and Related Work 


Our type system builds on a great deal of related work on RCU implementations 
and models; and general concurrent program verification. Due to space limit, 
this section captures only discussions on program logics, modeling RCU and 
memory models, but our technical report [20] includes detailed discussions on 
model-checking [8, 17,21], language oriented approaches [6, 16, 16] and realization 
of our semantics in an implementation as well. 


Modeling RCU and Memory Models. Alglave et al. |2] propose a mem- 
ory model to be assumed by the platform-independent parts of the Linux kernel, 
regardless of the underlying hardware’s memory model. As part of this, they give 
the first formalization of what it means for an RCU implementation to be correct 
(previously this was difficult to state, as the guarantees in principle could vary by 
underlying CPU architecture). Essentially, reader critical sections must not span 
grace periods. They prove by hand that the Linux kernel RCU implementation [1] 
satisfies this property. McKenney has defined fundamental requirements of RCU 
implementations [26]; our model in Sect. 3 is a valid RCU implementation accord- 
ing to those requirements (assuming sequential consistency) aside from one per- 
formance optimization, Read-to-Write Upgrade, which is important in practice 
but not memory-safety centric — see the technical report [20] for detailed discus- 
sion on satisfying RCU requirements. To the best of our knowledge, ours is the 
first abstract operational model for a Linux kernel-style RCU implementation — 
others are implementation-specific [22] or axiomatic like Alglave et al.’s. 

Tassarotti et al. model a well-known way of implementing RCU synchro- 
nization without hurting readers’ performance—Quiescent State Based Reclama- 
tion (QSBR) [8]—where synchronization between the writer thread and reader 
threads occurs via per-thread counters. Tassarotti et al. [32] uses a protocol based 
program logic based on separation and ghost variables called GPS [34] to verify 
a user-level implementation of RCU with a singly linked list client under release- 
acquire semantics, which is a weaker memory model than sequential-consistency. 
Despite the weaker model, the protocol that they enforce on their RCU primi- 
tives is nearly the same what our type system requires. The reads and writes to 
per thread QSBR structures are similar to our more abstract updates to reader 
and bounding sets. Therefore, we anticipate it would be possible to extend our 
type system in the future for similar weak memory models. 
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Program Logics. Fu et al. [12] extend Rely-Guarantee and Separation- 
Logic [10, 11,35] with the past-tense temporal operator to eliminate the need for 
using a history variable and lift the standard separation conjunction to assert 
over on execution histories. Gotsman et al. [15] take assertions from temporal 
logic to separation logic [35] to capture the essence of epoch-based memory recla- 
mation algorithms and have a simpler proof than what Fu et al. have [12] for 
Michael’s non-blocking stack [29] implementation under a sequentially consistent 
memory model. 

Tassarotti et al. [32] use abstract-predicates — e.g. WriterSafe — that are spe- 
cialized to the singly-linked structure in their evaluation. This means reusing 
their ideas for another structure, such as a binary search tree, would require 
revising many of their invariants. By contrast, our types carry similar informa- 
tion (our denotations are similar to their definitions), but are reusable across at 
least singly-linked and tree data structures (Sect. 5). Their proofs of a linked list 
also require managing assertions about RCU implementation resources, while 
these are effectively hidden in the type denotations in our system. On the other 
hand, their proofs ensure full functional correctness. Meyer and Wolff [28] make 
a compelling argument that separating memory safety from correctness if prof- 
itable, and we provide such a decoupled memory safety argument. 


8 Conclusions 


We presented the first type system that ensures code uses RCU memory man- 
agement safely, and which is significantly simpler than full-blown verification 
logics. To this end, we gave the first general operational model for RCU-based 
memory management. Based on our suitable abstractions for RCU in the oper- 
ational semantics we are the first showing that decoupling the memory-safety 
proofs of RCU clients from the underlying reclamation model is possible. Meyer 
et al. [28] took similar approach for decoupling the correctness verification of 
the data structures from the underlying reclamation model under the assump- 
tion of the memory-safety for the data structures. We demonstrated the appli- 
cability/reusability of our types on two examples: a linked-list based bag [25] 
and a binary search tree [3]. To our best knowledge, we are the first presenting 
the memory-safety proof for a tree client of RCU. We managed to prove type 
soundness by embedding the type system into an abstract concurrent separation 
logic called the Views Framework [9] and encode many RCU properties as either 
type-denotations or global invariants over abstract RCU state. By doing this, 
we managed to discharge these invariants once as a part of soundness proof and 
did not need to prove them for each different client. 


Acknowledgements. We are grateful to Matthew Parkinson for guidance and pro- 
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the paper. 
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Abstract. Computer scientists are well-versed in dealing with data 
structures. The same cannot be said about their dual: codata. Even 
though codata is pervasive in category theory, universal algebra, and 
logic, the use of codata for programming has been mainly relegated 
to representing infinite objects and processes. Our goal is to demon- 
strate the benefits of codata as a general-purpose programming abstrac- 
tion independent of any specific language: eager or lazy, statically or 
dynamically typed, and functional or object-oriented. While codata 
is not featured in many programming languages today, we show how 
codata can be easily adopted and implemented by offering simple inter- 
compilation techniques between data and codata. We believe codata is a 
common ground between the functional and object-oriented paradigms; 
ultimately, we hope to utilize the Curry-Howard isomorphism to further 
bridge the gap. 
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1 Introduction 


Functional programming enjoys a beautiful connection to logic, known as the 
Curry-Howard correspondence, or proofs as programs principle [22]; results and 
notions about a language are translated to those about proofs, and vice-versa 
[17]. In addition to expressing computation as proof transformations, this connec- 
tion is also fruitful for education: everybody would understand that the assump- 
tion “an x is zero” does not mean “every x is zero,” which in turn explains the 
subtle typing rules for polymorphism in programs. The typing rules for modules 
are even more cryptic, but knowing that they correspond exactly to the rules for 
existential quantification certainly gives us more confidence that they are cor- 
rect! While not everything useful must have a Curry-Howard correspondence, we 
believe finding these delightful coincidences where the same idea is rediscovered 
many times in both logic and programming can only be beneficial [42]. 
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One such instance involves codata. In contrast with the mystique it has as 
a programming construct, codata is pervasive in mathematics and logic, where 
it arises through the lens of duality. The most visual way to view the duality 
is in the categorical diagrams of sums versus products—the defining arrows go 
into a sum and come out of a product—and in algebras versus coalgebras [25]. 
In proof theory, codata has had an impact on theorem proving [5] and on the 
foundation of computation via polarity [29,45]. Polarity recognizes which of two 
dialogic actors speaks first: the proponent (who seeks to verify or prove a fact) 
or the opponent (who seeks to refute the fact). 

The two-sided, interactive view appears all over the study of programming 
languages, where data is concerned about how values are constructed and codata 
is concerned about how they are used [15]. Sometimes, this perspective is read- 
ily apparent, like with session types [7] which distinguish internal choice (a 
provider’s decision) versus external choice (a client’s decision). But other occur- 
rences are more obscure, like in the semantics of PCF (i.e. the call-by-name 
A-calculus with numbers and general recursion). In PCF, the result of evaluat- 
ing a program must be of a ground type in order to respect the laws of functions 
(namely 7) [32]. This is not due to differences between ground types versus 
“higher types,” but to the fact that data types are directly observable, whereas 
codata types are only indirectly observable via their interface. 

Clearly codata has merit in theoretical pursuits; we think it has merit in 
practical ones as well. The main application of codata so far has been for repre- 
senting infinite objects and coinductive proofs in proof assistants [1,39]. However, 
we believe that codata also makes for an important general-purpose program- 
ming feature. Codata is a bridge between the functional and object-oriented 
paradigms; a common denominator between the two very different approaches 
to programming. On one hand, functional languages are typically rich in data 
types—as many as the programmer wants to define via data declarations—but 
has a paucity of codata types (usually just function types). On the other hand, 
object-oriented languages are rich in codata types—programmer-defined in terms 
of classes or interfaces—but a paucity of data types (usually just primitives like 
booleans and numbers). We illustrate this point with a collection of example 
applications that arise in both styles of programming, including common encod- 
ings, demand-driven programming, abstraction, and Hoare-style reasoning. 

While codata types can be seen in the shadows behind many examples of 
programming—often hand-compiled away by the programmer—not many func- 
tional languages have native support for them. To this end, we demonstrate a 
pair of simple compilation techniques between a typical core functional language 
(with data types) and one with codata. One direction—based on the well-known 
visitor pattern from object-oriented programming—simultaneously shows how 
to extend an object-oriented language with data types (as is done by Scala) and 
how to compile core functional programs to a more object-oriented setting (e.g. 
targeting a backend like JavaScript or the JVM). The other shows how to add 
native codata types to functional languages by reducing them to commonly- 
supported data types and how to compile a “pure” object-oriented style of 
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programming to a functional setting. Both of these techniques are macro- 
expansions that are not specific to any particular language, as they work with 
both statically and dynamically typed disciplines, and they preserve the well- 
typed status of programs without increasing the complexity of the types involved. 

Our claim is that codata is a universal programming feature that has been 
thus-far missing or diminished in today’s functional programming languages. 
This is too bad, since codata is not just a feature invented for the convenience 
of programmers, but a persistent idea that has sprung up over and over from 
the study of mathematics, logic, and computation. We aim to demystify codata, 
and en route, bridge the wide gulf between the functional and object-oriented 
paradigms. Fortunately, it is easy for most mainstream languages to add or 
bring out codata today without a radical change to their implementation. But 
ultimately, we believe that the languages of the future should incorporate both 
data and codata outright. To that end, our contributions are to: 


— (Section 2) Illustrate the benefits of codata in both theory and practice: (1) a 
decomposition of well-known A-calculus encodings by inverting the priority 
of construction and destruction; (2) a first-class abstraction mechanism; (3) a 
method of demand-driven programming; and (4) a static type system for 
representing Hoare-style invariants on resource use. 

— (Section3) Provide simple transformations for compiling data to codata, 
and vice-versa, which are appropriate for languages with different evaluation 
strategies (eager or lazy) and type discipline (static or dynamic). 

— (Section4) Demonstrate various implementations of codata for general- 
purpose programming in two ways: (1) an extension of Haskell with codata; 
and (2) a prototype language that compiles to several languages of different 
evaluation strategies, type disciplines, and paradigms. 


2 The Many Faces of Codata 


Codata can be used to solve other problems in programming besides representing 
infinite objects and processes like streams and servers [1,39]. We start by present- 
ing codata as a merger between theory and practice, whereby encodings of data 
types in an object-oriented style turn out to be a useful intermediate step in the 
usual encodings of data in the A-calculus. Demand-driven programming is con- 
sidered a virtue of lazy languages, but codata is a language-independent tool for 
capturing this programming idiom. Codata exactly captures the essence of pro- 
cedural abstraction, as achieved with A-abstractions and objects, with a logically 
founded formalism [16]. Specifying pre- and post-conditions of protocols, which is 
available in some object systems [14], is straightforward with indexed, recursive 
codata types, i.e. objects with guarded methods [40]. 


2.1 Church Encodings and Object-Oriented Programming 


Crucial information structures, like booleans, numbers, and lists can be encoded 
in the untyped \-calculus (a.k.a. Church encodings) or in the typed polymorphic 
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A-calculus (a.k.a. BOhm-Berarducci [9] encodings). It is quite remarkable that 
data structures can be simulated with just first-class, higher-order functions. 
The downside is that these encodings can be obtuse at first blush, and have the 
effect of obscuring the original program when everything is written with just As 
and application. For example, the \-representation of the boolean value True, 
the first projection out of a pair, and the constant function K are all expressed 
as Ax.Ay.x, which is not that immediately evocative of its multi-purpose nature. 

Object-oriented programmers have also been representing data structures in 
terms of objects. This is especially visible in the Smalltalk lineage of languages 
like Scala, wherein an objective is that everything that can be an object is. As 
it turns out, the object-oriented features needed to perform this representation 
technique are exactly those of codata. That is because Church-style encodings 
and object-oriented representations of data all involve switching focus from the 
way values are built (i.e. introduced) to the way they are used (i.e. eliminated). 

Consider the representation of Boolean values as an algebraic data type. 
There may be many ways to use a Boolean value. However, it turns out that there 
is a most-general eliminator of Booleans: the expression if b then x else y. 
This basic construct can be used to define all the other uses for Bools. Instead of 
focusing on the constructors True and False let’s then focus on this most-general 
form of Bool elimination; this is the essence of the encodings of booleans in terms 
of objects. In other words, booleans can be thought of as objects that implement 
a single method: If. So that the expression if b then x else y would instead 
be written as (b.If x y). We then define the true and false values in terms of 
their reaction to If: 


true = {If x y — x} false = {If x y > ył} 


Or alternatively, we can write the same definition using copatterns, popularized 
for use in the functional paradigm by Abel et al. [1] by generalizing the usual 
pattern-based definition of functions by multiple clauses, as: 


true.If x y = x false.If x y = y 


This works just like equational definitions by pattern-matching in functional 
languages: the expression to the left of the equals sign is the same as the expres- 
sion to the right (for any binding of x and y). Either way, the net result is that 
(true.If "yes" "no") is "yes", whereas (false.If "yes" "no") is "no". 

This covers the object-based presentation of booleans in a dynamically typed 
language, but how do static types come into play? In order to give a type descrip- 
tion of the above boolean objects, we can use the following interface, analogous 
to a Java interface: 


codata Bool where If : Bool — (forall a. a — a — a) 


This declaration is dual to a data declaration in a functional language: data 
declarations define the types of constructors (which produce values of the data 
type) and codata declarations define the types of destructors (which consume 
values of the codata type) like If. The reason that the If observation introduces 
its own polymorphic type a is because an if-then-else might return any type of 
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result (as long as both branches agree on the type). That way, both the two 
objects true and false above are values of the codata type Bool. 

At this point, the representation of booleans as codata looks remarkably close 
to the encodings of booleans in the A-calculus! Indeed, the only difference is that 
in the A-calculus we “anonymize” booleans. Since they reply to only one request, 
that request name can be dropped. We then arrive at the familiar encodings in 
the polymorphic A-calculus: 


Bool = Va.a > a —>a true = Aa.àx:a.\y:a.x false = Aa.Ar:a.Ay:a.y 


In addition, the invocation of the If method just becomes ordinary function 
application; b.If x y of type a is written as b a x y. Otherwise, the definition 
and behavior of booleans as either codata types or as polymorphic functions are 
the same. 

This style of inverting the definition of data types—either into specific codata 
types or into polymorphic functions—is also related to another concept in object- 
oriented programming. First, consider how a functional programmer would rep- 
resent a binary Tree (with integer-labeled leaves) and a walk function that 
traverses a tree by converting the labels on all leaves and combining the results 
of sub-trees: 


data Tree where Leaf : Int — Tree 

Branch : Tree — Tree — Tree 
walk : (Int a) (a a a) Tree a 
walk b f (Leaf x) =þx 


walk b f (Branch 1 r) = f (walk b f 1) (walk b f r) 


The above code relies on pattern-matching on values of the Tree data type and 
higher-order functions b and f for accumulating the result. Now, how might an 
object-oriented programmer tackle the problem of traversing a tree-like struc- 
ture? The visitor pattern! With this pattern, the programmer specifies a “visitor” 
object which contains knowledge of what to do at every node of the tree, and 
tree objects must be able to accept a visitor with a method that will recursively 
walk down each subcomponent of the tree. In a pure style—which returns an 
accumulated result directly instead of using mutable state as a side channel for 
results—the visitor pattern for a simple binary tree interface will look like: 
codata TreeVisitor a where 


VisitLeaf : TreeVisitor a — (Int — a) 
VisitBranch : TreeVisitor a —> (a > a —> a) 


codata Tree where 


Walk : Tree — (forall a. TreeVisitor a —> a) 
leaf : Int — Tree 
leaf x = {Walk v —> v.VisitLeaf x} 
branch : Tree — Tree — Tree 


branch 1 {Walk v ~ v.VisitBranch (1.Walk v) (r.Walk v)} 


i] 
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And again, we can write this same code more elegantly, without the need to 
break apart the two arguments across the equal sign with a manual abstraction, 
using copatterns as: 

(leaf x).Walk v = v.VisitLeaf x 

(branch 1 r).Walk v = v.VisitBranch (1.Walk v) (r.Walk v) 


Notice how the above code is just an object-oriented presentation of the following 
encoding of binary trees into the polymorphic A-calculus: 


Tree = Va. Tree Visitor a > a Tree Visitor a = (Int > a) x (a> a > a) 
leaf : Int — Tree 

leaf (a:Int) = Aa.Xv: Tree Visitor a. (fst v) x 

branch : Va.Tree — Tree — Tree 

branch (1: Tree) (r: Tree) = Aa.Xv: Tree Visitor a. (snd v) (La v) (r av) 


The only essential difference between this A-encoding of trees versus the A- 
encoding of booleans above is currying: the representation of the data type 
Tree takes a single product Tree Visitor a of the necessary arguments, whereas 
the data type Bool takes the two necessary arguments separately. Besides this 
easily-converted difference of currying, the usual Bohm-Berarducci encodings 
shown here correspond to a pure version of the visitor pattern. 


2.2 Demand-Driven Programming 


In “Why functional programming matters” [23], Hughes motivates the utility 
of practical functional programming through its excellence in compositionality. 
When designing programs, one of the goals is to decompose a large problem into 
several manageable sub-problems, solve each sub-problem in isolation, and then 
compose the individual parts together into a complete solution. Unfortunately, 
Hughes identifies some examples of programs which resist this kind of approach. 

In particular, numeric algorithms—for computing square roots, derivatives 
integrals—rely on an infinite sequence of approximations which converge on the 
true answer only in the limit of the sequence. For these numeric algorithms, the 
decision on when a particular approximation in the sequence is “close enough” 
to the real answer lies solely in the eyes of the beholder: only the observer of 
the answer can say when to stop improving the approximation. As such, stan- 
dard imperative implementations of these numeric algorithms are expressed as 
a single, complex loop, which interleaves both the concerns of producing bet- 
ter approximations with the termination decision on when to stop. Even more 
complex is the branching structure of the classic minimax algorithm from arti- 
ficial intelligence for searching for reasonable moves in two-player games like 
chess, which can have an unreasonably large (if not infinite) search space. Here, 
too, there is difficulty separating generation from selection, and worse there is 
the intermediate step of pruning out uninteresting sub-trees of the search space 
(known as alpha-beta pruning). As a result, a standard imperative implemen- 
tation of minimax is a single, recursive function that combines all the tasks— 
generation, pruning, estimation, and selection—at once. 


Codata in Action 125 


Hughes shows how both instances of failed decomposition can be addressed 
in functional languages through the technique of demand-driven programming. 
In each case, the main obstacle is that the control of how to drive the next 
step of the algorithm—whether to continue or not—lies with the consumer. The 
producer of potential approximations and game states, in contrast, should only 
take over when demanded by the consumer. By giving primary control to the 
consumer, each of these problems can be decomposed into sensible sub-tasks, and 
recomposed back together. Hughes uses lazy evaluation, as found in languages 
like Miranda and Haskell, in order to implement the demand-driven algorithms. 
However, the downside of relying on lazy evaluation is that it is a whole-language 
decision: a language is either lazy by default, like Haskell, or not, like OCaml. 
When working in a strict language, expressing these demand-driven algorithms 
with manual laziness loses much of their original elegance [33]. 

In contrast, a language should directly support the capability of yielding 
control to the consumer independently of the language being strict or lazy; anal- 
ogously to what happens with lambda abstractions. An abstraction computes 
on-demand, why is this property relegated to this predefined type only? In fact, 
the concept of codata also has this property. As such, it allows us to describe 
demand-driven programs in an agnostic way which works just as well in Haskell 
as in OCaml without any additional modification. For example, we can imple- 
ment Hughes’ demand-driven AI game in terms of codata instead of laziness. To 
represent the current game state, and all of its potential developments, we can 
use an arbitrarily-branching tree codata type. 


codata Tree a where 
Node : Tree a-a 
Children : Tree a — List (Tree a) 


The task of generating all potential future boards from the current board state 
produces one of these tree objects, described as follows (where moves of type 
Board — List Board generates a list of possible moves): 


gameTree : Board — Tree Board 
(gameTree b).Node = þ 
(gameTree b).Children = map gameTree (moves b) 


Notice that the tree might be finite, such as in the game of Tic-Tac-Toe. However, 
it would still be inappropriate to waste resources fully generating all moves 
before determining which are even worth considering. Fortunately, the fact that 
the responses of a codata object are only computed when demanded means that 
the consumer is in full control over how much of the tree is generated, just as in 
Hughes’ algorithm. This fact lets us write the following simplistic prune function 
which cuts off sub-trees at a fixed depth. 


prune : Int — Tree Board — Tree Board 

(prune x t).Node = t.Node 

(prune 0 t).Children [] 

(prune x t).Children map (prune(x-1)) t.Children 
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The more complex alpha-beta pruning algorithm can be written as its own pass, 
similar to prune above. Just like Hughes’ original presentation, the evaluation 
of the best move for the opponent is the composition of a few smaller functions: 


eval = maximize . maptree score . prune 5 . gameTree 


What is the difference between this codata version of minimax and the one pre- 
sented by Hughes that makes use of laziness? They both compute on-demand 
which makes the game efficient. However, demand-driven code written with 
codata can be easily ported between strict and lazy languages with only syntac- 
tic changes. In other words, codata is a general, portable, programming feature 
which is the key for compositionality in program design.! 


2.3 Abstraction Mechanism 


In the pursuit of scalable and maintainable program design, the typical followup 
to composability is abstraction. The basic purpose of abstraction is to hide cer- 
tain implementation details so that different parts of the code base need not be 
concerned with them. For example, a large program will usually be organized into 
several different parts or “modules,” some of which may hold general-purpose 
“library” code and others may be application-specific “clients” of those libraries. 
Successful abstractions will leverage tools of the programming language in ques- 
tion so that there is a clear interface between libraries and their clients, codi- 
fying which details are exposed to the client and which are kept hidden inside 
the library. A common such detail to hide is the concrete representation of some 
data type, like strings and collections. Clear abstraction barriers give freedom to 
both the library implementor (to change hidden details without disrupting any 
clients) as well as the client (to ignore details not exposed by the interface). 

Reynolds [35] identified, and Cook [12] later elaborated on, two different 
mechanisms to achieve this abstraction: abstract data types and procedural 
abstraction. Abstract data types are crisply expressed by the Standard ML mod- 
ule system, based on existential types, which serves as a concrete practical touch- 
stone for the notion. Procedural abstraction is pervasively used in object-oriented 
languages. However, due to the inherent differences among the many languages 
and the way they express procedural abstraction, it may not be completely clear 
of what the “essence” is, the way existential types are the essence of modules. 
What is the language-agnostic representation of procedural abstraction? Codata! 
The combination of observation-based interfaces, message-passing, and dynamic 
dispatch are exactly the tools needed for procedural abstraction. Other common 
object-oriented features—like inheritance, subtyping, encapsulation, and muta- 
ble state—are orthogonal to this particular abstraction goal. While they may 
be useful extensions to codata for accomplishing programming tasks, only pure 
codata itself is needed to represent abstraction. 


1 To see the full code for all the examples of [24] implemented in terms of codata, visit 
https://github.com/zachsully /codata_examples. 
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Specifying a codata type is giving an interface—between an implementation 
and a client—so that instances of the type (implementations) can respond to 
requests (clients). In fact, method calls are the only way to interact with our 
objects. As usual, there is no way to “open up” a higher-order function—one 
example of a codata type—and inspect the way it was implemented. The same 
intuition applies to all other codata types. For example, Cook’s [12] procedural 
“set” interface can be expressed as a codata type with the following observations: 


codata Set where 


IsEmpty : Set — Bool 
Contains : Set — Int — Bool 
Insert : Set — Int — Set 
Union : Set — Set — Set 


Every single object of type Set will respond to these observations, which is 
the only way to interact with it. This abstraction barrier gives us the freedom of 
defining several different instances of Set objects that can all be freely composed 
with one another. One such instance of Set uses a list to keep track of a hidden 
state of the contained elements (where elemOf : List Int — Int — Bool 
checks if a particular number is an element of the given list, and the operation 
fold : (a — b > b) — b — List a — bis the standard functional fold): 


finiteSet : List Int — Set 


(finiteSet xs).IsEmpty = xs == [] 

(finiteSet xs).Contains y = elemOf xs y 

(finiteSet xs).Insert y = finiteSet (y:xs) 

(finiteSet xs).Union s = fold (Ax t — t.Insert x) s xs 


emptySet = finiteSet [] 


But of course, many other instances of Set can also be given. For example, 
this codata type interface also makes it possible to represent infinite sets like 
the set evens of all even numbers which is defined in terms of the more gen- 
eral evensUnion that unions all even numbers with some other set (where the 
function isEven : Int — Int checks if a number is even): 


evens = evensUnion emptySet 

evensUnion : Set — Set 

(evensUnion s).IsEmpty = False 

(evensUnion s).Contains y = isEven y || s.Contains y 
(evensUnion s).Insert y = evensUnion (s.Insert y) 
(evensUnion s).Union t = evensUnion (s.Union t) 


Because of the natural abstraction mechanism provided by codata, different Set 
implementations can interact with each other. For example, we can union a 
finite set and evens together because both definitions of Union know nothing 
of the internal structure of the other Set. Therefore, all we can do is apply the 
observations provided by the Set codata type. 
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While sets of numbers are fairly simplistic, there are many more practical 
real-world instances of the procedural abstraction provided by codata to be 
found in object-oriented languages. For example, databases are a good use of 
abstraction, where basic database queries can be represented as the observations 
on table objects. A simplified interface to a database table (containing rows of 
type a) with selection, deletion, and insertion, is given as follows: 


codata Database a where 


Select : Database a — (a — Bool) — List a 
Delete : Database a — (a — Bool) — Database a 
Insert : Database a — a — Database a 


On one hand, specific implementations can be given for connecting to and com- 
municating with a variety of different databases—like Postgres, MySQL, or just 
a simple file system—which are hidden behind this interface. On the other hand, 
clients can write generic operations independently of any specific database, such 
as copying rows from one table to another or inserting a row into a list of com- 
patible tables: 


copy : Database a — Database a — Database a 
copy from to = let rows = from.Select(A_ — True) 
in foldr (Arow db — db.Insert row) to rows 


insertAll : List (Database a) —> a— List (Database a) 
insertAll dbs row = map (Adb — db.Insert row) dbs 


In addition to abstracting away the details of specific databases, both copy and 
insertAll can communicate between completely different databases by just 
passing in the appropriate object instances, which all have the same generic 
type. Another use of this generality is for testing. Besides the normal instances 
of Database a which perform permanent operations on actual tables, one can 
also implement a fictitious simulation which records changes only in temporary 
memory. That way, client code can be seamlessly tested by running and checking 
the results of simulated database operations that have no external side effects 
by just passing pure codata objects. 


2.4 Representing Pre- and Post-Conditions 


The extension of data types with indexes (a.k.a. generalized algebraic data types) 
has proven useful to statically verify a data structure’s invariant, like for red- 
black trees [43]. With indexed data types, the programmer can inform the static 
type system that a particular value of a data type satisfies some additional 
conditions by constraining the way in which it was constructed. Unsurprisingly, 
indexed codata types are dual and allow the creator of an object to constrain 
the way it is going to be used, thereby adding pre- and post-conditions to the 
observations of the object. In other words, in a language with type indexes, 
codata enables the programmer to express more information in its interface. 
This additional expressiveness simplifies applications that rely on a type 
index to guard observations. Thibodeau et al. [40] give examples of such 
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programs, including an automaton specification where its transitions correspond 
to an observation that changes a pre- and post-condition in its index, and a fair 
resource scheduler where the observation of several resources is controlled by an 
index tracking the number of times they have been accessed. For concreteness, 
let’s use an indexed codata type to specify safe protocols as in the following 
example from an object-oriented language with guarded methods: 


index Raw, Bound, Live 


codata Socket i where 


Bind : Socket Raw — String — Socket Bound 
Connect : Socket Bound — Socket Live 

Send : Socket Live — String — () 

Receive : Socket Live — String 

Close : Socket Live — () 


This example comes from DeLine and Fahndrich [14], where they present an 
extension to CË constraining the pre- and post-conditions for method calls. If 
we have an instance of this Socket i interface, then observing it through the 
above methods can return new socket objects with a different index. The index 
thereby governs the order in which clients are allowed to apply these methods. 
A socket will start with the index Raw. The only way to use a Socket Raw is to 
Bind it, and the only way to use a Socket Bound is to Connect it. This forces 
us to follow a protocol when initializing a Socket. 


Intermezzo 1. This declaration puts one aspect in the hands of the program- 
mer, though. A client can open a socket and never close it, hogging the resource. 
We can remedy this problem with linear types, which force us to address any 
loose ends before finishing the program. With linear types, it would be a type 
error to have a lingering Live socket laying around at the end of the program, 
and a call to Close would use it up. Furthermore, linear types would ensure 
that outdated copies of Socket objects cannot be used again, which is espe- 
cially appropriate for actions like Bind which is meant to transform a Raw socket 
into a Bound one, and likewise for Connect which transforms a Bound socket 
into a Live one. Even better, enhancing linear types with a more sophisticated 
notion of ownership—like in the Rust programming language which differentiates 
a permanent transfer of ownership from temporarily borrowing it—makes this 
resource-sensitive interface especially pleasant. Observations like Bind, Connect, 
and Close which are meant to fully consume the observed object would involve 
full ownership of the object itself to the method call and effectively replace the 
old object with the returned one. In contrast, observations like Send and Receive 
which are meant to be repeated on the same object would merely borrow the 
object for the duration of the action so that it could be used again. 


3 Inter-compilation of Core Calculi 


We saw previously examples of using codata types to replicate well-known encod- 
ings of data types into the A-calculus. Now, let’s dive in and show how data and 
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codata types formally relate to one another. In order to demonstrate the relation- 
ship, we will consider two small languages that extend the common polymorphic 
d-calculus: \¢“* extends \ with user-defined algebraic data types, and \¢°¢7t@ 
extends A with user-defined codata types. In the end, we will find that both of 
these foundational languages can be inter-compiled into one another. Data can 
be represented by codata via the visitor pattern (Y). Codata can be represented 
by data by tabulating the possible answers of objects (F). 


Visitor (Y) 
OS 
Pe ee 


Tabulate (T) 


data \codata 


In essence, this demonstrates how to compile programs between the functional 
and object-oriented paradigms. The & direction shows how to extend existing 
functional languages (like OCaml, Haskell, or Racket) with codata objects with- 
out changing their underlying representation. Dually, the Y direction shows how 
to compile functional programs with data types into an object-oriented target 
language (like JavaScript). 

Each of the encodings are macro expansions, in the sense that they leave the 
underlying base A-calculus constructs of functions, applications, and variables 
unchanged (as opposed to, for example, continuation-passing style translations). 
They are defined to operate on untyped terms, but they also preserve typabil- 
ity when given well-typed terms. The naive encodings preserve the operational 
semantics of the original term, according to a call-by-name semantics. We also 
illustrate how the encodings can be modified slightly to correctly simulate the 
call-by-value operational semantics of the source program. To conclude, we show 
how the languages and encodings can be generalized to more expressive type 
systems, which include features like existential types and indexed types (a.k.a. 
generalized algebraic data types and guarded methods). 


Notation. We use both an overline t and dots tı... to indicate a sequence of 
terms t (and likewise for types, variables, etc.). The arrow type 7 —> T means 
Ti +++ Tn > T; when n is 0, it is not a function type, i.e. just the codomain 
T. The application K ~ means (((K t1) ...) tn); when n is 0, it is not a func- 
tion application, but the constant K. We write a single step of an operational 
semantics with the arrow ++, and many steps (i.e. its reflexive-transitive closure) 
as ++. Operational steps may occur within an evaluation context E, i.e. tr t 
implies that E[t] > E[t’]. 


3.1 Syntax and Semantics 


We present the syntax and semantics of the base language and the two extensions 
Adata and \°°42, For the sake of simplicity, we keep the languages as minimal 
as possible to illustrate the main inter-compilations. Therefore, \2%* and \°od2t@ 
do not contain recursion, nested (co)patterns, or indexed types. The extension 
with recursion is standard, and an explanation of compiling (co)patterns can be 
found in [11,38,39]. Indexed types are later discussed informally in Sect. 3.6. 
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Syntax: 
Type T,p2=altT>p|Va.t 
Term Ð t,u,en=a2|tul|Az.e 
Operational Semantics: 
Call-by-name Call-by-value 
V ::= g | Age E:=0|Eu V ::= g | àz.e E:=0|Eu|V E 
(Ax.e) u > eļu/x (Az.e) V > e[V/zx] 


Type System (where S = t for call-by-name and S = V for call-by-value): 


re:rer rFĀFt:Tr=>p Pruett veer hes p 
De aa Trtu:p PrAgn.e:T—> p 
Lat Sir TFt:Var [Fp 
DE S:Va.t [CF t:7[p/a] 


Fig. 1. Polymorphic A-calculus: the base language 


The Base Language. We will base both our core languages of interest on a 
common starting point: the polymorphic A-calculus as shown in Fig. 1.? This is 
the standard simply typed A-calculus extended with impredicative polymorphism 
(a.k.a. generics). There are only three forms of terms (variables x, applications 
t u, and function abstractions Ax.e) and three forms of types (type variables a, 
function types T —> p, and polymorphic types Va.r). We keep the type abstrac- 
tion and instantiation implicit in programs—as opposed to explicit as in System 
F—for two reasons. First, this more accurately resembles the functional lan- 
guages in which types are inferred, as opposed to mandatory annotations explicit 
within the syntax of programs. Second, it more clearly shows how the transla- 
tions that follow do not rely on first knowing the type of terms, but apply to any 
untyped term. In other words, the compilation techniques are also appropriate 
for dynamically typed languages like Scheme and Racket. 

Figure 1 reviews both the standard call-by-name and call-by-value opera- 
tional semantics for the \-calculus. As usual, the difference between the two is 
that in call-by-value, the argument of a function call is evaluated prior to substi- 
tution, whereas in call-by-name the argument is substituted first. This is implied 
by the different set of evaluation contexts (E) and the fact that the operational 
rule uses a more restricted notion of value (V) for substitutable arguments in 
call-by-value. Note that, there is an interplay between evaluation and typing. In 
a more general setting where effects are allowed, the typing rule for introducing 
polymorphism (i.e. the rule with S : Va.r in the conclusion) is only safe for 
substitutable terms, which imposes the well-known the value restriction for call- 
by-value (limiting S to values), but requires no such restriction in call-by-name 
where every term is a substitutable value (letting S be any term). 


? The judgement I H p should be read as: all free type variables in p occur in I’. As 
usual J,a means that a does not occur free in I’. 
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Syntax: 
Declaration 5 d := data T @ where K:7> Ta... 
Type T,p :=a|T—>p|Ya.r|TP 
Term > t,u,e := g | t u | Azx.e | Kĉ | case t {K T > t} 


Operational Semantics: 


Call-by-name Call-by-value 
V =K Vise | KV 
E ::= --- | case E {K 7 > e} E ::= -| case E {KE >e}|KV Et 


case (K t) {KT >e, ...} = eft/zx] case (K V) {Kz >e, ...} = e[V/zx] 
Type System: 
K:n >- > Tacr Ft: n{p/al 
ae ees a, 
ret:Tp Ki:Van37Ta@el T, xı: n[p/a] t e:r 
I b case t {Ki T > e1, opi T 


Fig. 2. \¢¢*: Extending polymorphic A-calculus with data types 
8 g y 


A Language with Data. The first extension of the A-calculus is with user- 
defined data types, as shown in Fig. 2; it corresponds to a standard core language 
for statically typed functional languages. Data declarations introduce a new type 
constructor (T) as well as some number of associated constructors (K) that build 
values of that data type. For simplicity, the list of branches in a case expression 
are considered unordered and non-overlapping (i.e. no two branches for the same 
constructor within a single case expression). The types of constructors are given 
alongside free variables in I’, and the typing rule for constructors requires they 
be fully applied. We also assume an additional side condition to the typing rule 
for case expressions that the branches are exhaustive (i.e. every constructor of 
the data type in question is covered as a premise). 

Figure2 presents the extension to the operational semantics from Fig. 1, 
which is also standard. The new evaluation rule for data types reduces a case 
expression matched with an applied constructor. Note that since the branches 
are unordered, the one matching the constructor is chosen out of the possi- 
bilities and the parameters of the constructor are substituted in the branch’s 
pattern. There is also an additional form of constructed values: in call-by-name 
any constructor application is a value, whereas in call-by-value only construc- 
tors parameterized by other values is a value. As such, call-by-value goes on to 
evaluate constructor parameters in advance, as shown by the extra evaluation 
context. In both evaluation strategies, there is a new form of evaluation context 
that points out the discriminant of a case expression, since it is mandatory to 
determine which constructor was used before deciding the appropriate branch 
to take. 
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Syntax: 
Declaration > d:= codata U a where H:U@ >r... 
Type T,p :=a|T —>p|Ya.r| UP 
Term > t,u,e ::= q | t u | Azx.e | t.H | {H > e} 


Operational Semantics: 


Call-by-name Call-by-value 
Vius-:-|f{H oe} E=. | E.H Vis. |f{Hoe} Erm] EH 
{H > e,...}.HHe {H > e,...}.HHe 

Type System: 
H:Va.Ua>rel Fbrt:Up PRFH:Upom Precn 
TCE tH: r[p/a] TP {Hi >e1,...}:Up 


Fig. 3. \°°4¢"¢: Extending polymorphic \-calculus with codata types 


A Language with Codata. The second extension of the \-calculus is with 
user-defined codata types, as shown in Fig.3. Codata declarations in \°°4%¢ 
define a new type constructor (U) along with some number of associated destruc- 
tors (H) for projecting responses out of values of a codata type. The type level 
of \°°42t@ corresponds directly to A4*’*. However, at the term level, we have 
codata observations of the form ¢.H using “dot notation”, which can be thought 
of as sending the message H to the object t or as a method invocation from 
object-oriented languages. Values of codata types are introduced in the form 
{Hı > e1,...,Hn — en}, which lists each response this value gives to all the 
possible destructors of the type. As with case expressions, we take the branches 
to be unordered and non-overlapping for simplicity. 

Interestingly, the extension of the operational semantics with codata—the 
values, evaluation contexts, and reduction rules—are identical for both call-by- 
name and call-by-value evaluation. In either evaluation strategy, a codata object 
{H — e,...} is considered a value and the codata observation t.H must evaluate 
t no matter what to continue, leading to the same form of evaluation context 
E.H. The additional evaluation rule selects and invokes the matching branch of 
a codata object and is the same regardless of the evaluation strategy. 

Note that the reason that values of codata types are the same in any eval- 
uation strategy is due to the fact that the branches of the object are only ever 
evaluated on-demand, i.e. when they are observed by a destructor, similar to 
the fact that the body of a function is only ever evaluated when the function is 
called. This is the semantic difference that separates codata types from records 
found in many programming languages. Records typically map a collection of 
labels to a collection of values, which are evaluated in advance in a call-by-value 
language similar to the constructed values of data types. Whereas with codata 
objects, labels map to behavior which is only invoked when observed. 
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codata Tyisi: @ b where 
data T a where ] Ki: Tvist @ b > Tb 
Ki:71 > T@ d 


y , = : 
x Kn : T visit aba Tn > b 
Kn: m >Ta codata T @ where 


Casert : T @ — Vb. T visit a b — b 
UK; t] = {Caser > Av. (v.K:) V[t]} 
Ulcase t {Kı T1 > e1,...}] = (V[t].Caser) {Kı > ATT. Vlei], .--} 


Fig. 4. Y : \277 — \°°44t@ mapping data to codata via the visitor pattern 


The additional typing rules for \°°¢** are also given in Fig.3. The rule for 
typing t.H is analogous to a combination of type instantiation and application, 
when viewing H as a function of the given type. The rule for typing a codata 
object, in contrast, is similar to the rule for typing a case expression of a data 
type. However, in this comparison, the rule for objects is partially “upside down” 
in the sense that the primary type in question (U p) appears in the conclusion 
rather than as a premise. This is the reason why there is one less premise for 
typing codata objects than there is for typing data case expressions. As with 
that rule, we assume that the branches are exhaustive, so that every destructor 
of the codata type appears in the premise. 


3.2 Compiling Data to Codata: The Visitor Pattern 


In Sect. 2.1, we illustrated how to convert a data type representing trees into a 
codata type. This encoding corresponds to a rephrasing of the object-oriented 
visitor pattern to avoid unnecessary side-effects. Now lets look more generally 
at the pattern, to see how any algebraic data type in \“@* can be encoded in 
terms of codata in \°°4%*, 

The visitor pattern has the net effect of inverting the orientation of a data 
declaration (wherein construction comes first) into codata declarations (wherein 
destruction comes first). This reorientation can be used for compiling user- 
defined data types in \2** to codata types in \°°4** as shown in Fig. 4. As 
with all of the translations we will consider, this is a macro expansion since 
the syntactic forms from the base A-calculus are treated homomorphically (i.e. 
VlAx.e] = Ax. Vie], Vjt u] = Vit] Vu], and Viz] = x). Furthermore, this 
translation also perfectly preserves types, since the types of terms are exactly 
the same after translation (i.e. U[7] = 7). 

Notice how each data type (T a) gets represented by two codata types: the 
“visitor” (Tyisit @ b) which says what to do with values made with each construc- 
tor, and the type itself (T @) which has one method which accepts a visitor and 
returns a value of type b. An object of the codata type, then, must be capable of 
accepting any visitor, no matter what type of result it returns. Also notice that 
we include no other methods in the codata type representation of T @. 
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At the level of terms, first consider how the case expression of the data type 
is encoded. The branches of the case (contained within the curly braces) are 
represented as a first-class object of the visitor type: each constructor is mapped 
to the corresponding destructor of the same name and the variables bound in 
the pattern are mapped to parameters of the function returned by the object 
in each case. The whole case expression itself is then implemented by calling 
the sole method (Caser) of the codata object and passing the branches of the 
case as the corresponding visitor object. Shifting focus to the constructors, we 
can now see that they are compiled as objects that invoke the corresponding 
destructor on any given visitor, and the terms which were parameters to the 
constructor are now parameters to a given visitor’s destructor. Of course, other 
uses of the visitor pattern might involve a codata type (T) with more methods 
implementing additional functionality besides case analysis. However, we only 
need the one method to represent data types in \““** because case expressions 
are the primitive destructor for values of data types in the language. 

For example, consider applying the above visitor pattern to a binary tree 
data type as follows: 


codata Tree,yisi: b where 


data Tree where Leaf :Int—b 
G Leaf : Int — Tree = Branch : Tree — Tree — b 
Branch : Tree — Tree — Tree codata Tree where 


Casetree : Tree — Vb. Treeyisiz b — b 
YlLeaf n] = {Casetree + Av. v.Leaf n} 
Y|Branch l r] = {Casetiee + Av. v.Branch | r} 


Lefn > eall Leaf — An. Vlei] 
S | case t ose lr—- ai = BIE] Casetiee Nena — Al. Ar. Ufes] 


Note how this encoding differs from the one that was given in Sect. 2.1 since the 
Casetree method is non-recursive whereas the Walkt;ee method was recursive, in 
order to model a depth-first search traversal of the tree. 

Of course, other operations, like the walk function, could be written in terms 
of case expressions and recursion as usual by an encoding with above method 
calls. However, it is possible to go one step further and include other primitive 
destructors—like recursors or iterators in the style of Gödels system T—by 
embedding them as other methods of the encoded codata type. For example, we 
can represent walk as a primitive destructor as it was in Sect.2.1 in addition 
to non-recursive case analysis by adding an alternative visitor Tree,,,j, and one 
more destructor to the generated Tree codata type like so: 


codata Treewaz b where codata Tree where 
Leaf : Int — b Casetree : Tree — Vb. Treevisit b — b 
Branch :b— b— b Walktree : Tree > Vb. Treewak b — b 


Walktree — Aw. w.Leaf n 


U[Leaf n] = a — \v.v.Leaf n \ 
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For codata types with n destructors, where n > 1: 


codata U a where 
Hi:Ua>n _ data U a where 
: Tableu : 71 +--+ 4 Tm > UT 
Hn : U a> Tn 
Tlt.Hi] = case Tt] {Tableu y1... Yn > yi} 
THH > e1,...,Hn > en}] = Tableu Tei]... Tlen] 


For codata types with 0 destructors (where Unit is the same for every such U): 


z codata Ua where | _ data Unit where 
--no destructors | unit : Unit 


TL {}] = unit 


Fig. 5. T: A0°%ta _, Adata tabulating codata responses with data tuples 


BlBanck t= mos — dv.v.Branch l r \ 


Walktree > Aw. w.Branch (1.Walktree) (7.Walktree) 


where the definition of Tree,;.;, and the encoding of case expressions is the same. 
In other words, this compilation technique can generalize to as many primitive 
observations and recursion schemes as desired. 


3.3 Compiling Codata to Data: Tabulation 


Having seen how to compile data to codata, how can we go the other way? The 
reverse compilation would be useful for extending functional languages with 
user-defined codata types, since many functional languages are compiled to a 
core representation based on the A-calculus with data types. 

Intuitively, the declared data types in \2* can be thought of as “sums of 
products.” In contrast, the declared codata types in \°°?*4 can be thought of as 
“products of functions.” Since both core languages are based on the A-calculus, 
which has higher-order functions, the main challenge is to relate the two notions 
of “products.” The codata sense of products are based on projections out of 
abstract objects, where the different parts are viewed individually and only when 
demanded. The data sense of products, instead, are based on tuples, in which 
all components are laid out in advance in a single concrete structure. 

One way to convert codata to data is to tabulate an object’s potential answers 
ahead of time into a data structure. This is analogous to the fact that a function 
of type Bool — String can be alternatively represented by a tuple of type 
String * String, where the first and second components are the responses of 
the original function to true and false, respectively. This idea can be applied 
to \°e¢4t@ in general as shown in the compilation in Fig. 5. 
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A codata declaration of U becomes a data declaration with a single con- 
structor (Tabley) representing a tuple containing the response for each of the 
original destructors of U. At the term level, a codata abstraction is compiled by 
concretely tabulating each of its responses into a tuple using the Tabley construc- 
tor. A destructor application returns the specific component of the constructed 
tuple which corresponds to that projection. Note that, since we assume that each 
object is exhaustive, the tabulation transformation is relatively straightforward; 
filling in “missing” method definitions with some error value that can be stored 
in the tuple at the appropriate index would be done in advance as a separate 
pre-processing step. 

Also notice that there is a special case for non-observable “empty” codata 
types, which are all collapsed into a single pre-designated Unit data type. The 
reason for this collapse is to ensure that this compilation preserves typability: if 
applied to a well-typed term, the result is also well-typed. The complication arises 
from the fact that when faced with an empty object {}, we have no idea which 
constructor to use without being given further typing information. So rather 
than force type checking or annotation in advance for this one degenerate case, 
we instead collapse them all into a single data type so that there is no need to 
differentiate based on the type. In contrast, the translation of non-empty objects 
is straightforward, since we can use the name of any one of the destructors to 
determine the codata type it is associated with, which then informs us of the 
correct constructor to use. 


3.4 Correctness 


For the inter-compilations between A°°?** into 42%* to be useful in practice, 
they should preserve the semantics of programs. For now, we focus only on the 
call-by-name semantics for each of the languages. With the static aspect of the 
semantics, this means they should preserve the typing of terms. 


Proposition 1 (Type Preservation). For each of the U and T translations: 
if 0 + t:7 then [I] ft]: [7] (én the call-by-name type system). 


Proof (Sketch). By induction on the typing derivation of I F t:r. 


With the dynamic aspect of the semantics, the translations should preserve the 
outcome of evaluation (either converging to some value, diverging into an infinite 
loop, or getting stuck) for both typed and untyped terms. This works because 
each translation preserves the reduction steps, values, and evaluation contexts 
of the source calculus’ call-by-name operational semantics. 


Proposition 2 (Evaluation Preservation). For each of the U and F trans- 
lations: t œ> V if and only if [t] œ> [V] (in the call-by-name semantics). 


Proof (Sketch). The forward (“only if”) implication is a result of the following 
facts that hold for each translation in the call-by-name semantics: 
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— For any redex t in the source, if t+ t’ then [t] = t” +> [t]. 

— For any value V in the source, [V] is a value. 

— For any evaluation context E in the source, there is an evaluation context E’ 
in the target such that [E[t]] = E’[[é]] for all t. 


The reverse (“if”) implication then follows from the fact that the call-by-name 
operational semantics of both source and target languages is deterministic. 


3.5 Call-by-Value: Correcting the Evaluation Order 


The presented inter-compilation techniques are correct for the call-by-name 
semantics of the calculi. But what about the call-by-value semantics? It turns 
out that the simple translations seen so far do not correctly preserve the call- 
by-value semantics of programs, but they can be easily fixed by being more 
careful about how they treat the values of the source and target calculi. In other 
words, we need to make sure that values are translated to values, and evaluation 
contexts to evaluation contexts. For instance, the following translation (up to 
renaming) does not preserve the call-by-value semantics of the source program: 


{Fst — error, Snd — True}] = Pair error True 


The object {Fst — error, Snd — True} is a value in call-by-value, and the erro- 
neous response to the Fst will only be evaluated when observed. However, the 
structure Pair error True is not a value in call-by-value, because the field error 
must be evaluated in advance which causes an error immediately. In the other 
direction, we could also have 


Y|Pair error True] = {Case — Av. v.Pair error True} 


Here, the immediate error in Pair error True has become incorrectly delayed 
inside the value {Case — Av. v.Pair error True}. 

The solution to this problem is straightforward: we must manually delay 
computations that are lifted out of (object or A) abstractions, and manually 
force computations before their results are hidden underneath abstractions. For 
the visitor pattern, the correction is to only introduce the codata object on 
constructed values. We can handle other constructed terms by naming their 
non-value components in the style of administrative-normalization like so: 


GK; V] = {Casey > Av. v.K; V} 

BK; V ut] = let ex =u in VK; V x t] if u is not a value 
Conversely, the tabulating translation { will cause the on-demand observa- 
tions of the object to be converted to preemptive components of a tuple struc- 
ture. To counter this change in evaluation order, a thunking technique can be 

employed as follows: 
[t.H;] = case Tit] {Tabley y1... yn — force y;} 
TI {Hi > e1,...,Hn > en}] = Tabley (delay T]e1]) ... (delay T[en]) 
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The two operations can be implemented as delay t = Az.t and forcet = t unit 
as usual, but can also be implemented as more efficient memoizing operations. 
With all these corrections, Propositions 1 and 2 also hold for the call-by-value 
type system and operational semantics. 


3.6 Indexed Data and Codata Types: Type Equalities 


In the world of types, we have so far only formally addressed inter-compilation 
between languages with simple and polymorphic types. What about the compi- 
lation of indexed data and codata types? It turns out some of the compilation 
techniques we have discussed so far extend to type indexes without further effort, 
whereas others need some extra help. In particular, the visitor-pattern-based 
translation %Y can just be applied straightforwardly to indexed data types: 


codata Tyisii a b where 
data T a@ where Ki: Tvist P1 b > Tr > b 
K:a—>T 
: Kn: Tvisit Pn 6 > Tn — b 
Kn: m >T Pn codata T @ where 
Casert : T @— Vb. Tvisit € b — b 


In this case, the notion of an indexed visitor codata type exactly corresponds 
to the mechanics of case expressions for GADTs. In contrast, the tabulation 
translation { does not correctly capture the semantics of indexed codata types, 
if applied naively. 

Thankfully, there is a straightforward way of “simplifying” indexed data 
types to more conventional data types using some built-in support for type equal- 
ities. The idea is that a constructor with a more specific return type can be 
replaced with a conventional constructor that is parameterized by type equali- 
ties that prove that the normal return type must be the more specific one. The 
same idea can be applied to indexed codata types as well. A destructor that can 
only act on a more specific instance of the codata type can instead be replaced by 
one which works on any instance, but then immediately asks for proof that the 
object’s type is the more specific one before completing the observation. These 
two translations, of replacing type indexes with type equalities, are defined as: 


data T a where data T a where 
Ki:m—oT py Ki:@=n-1i—- Ta 
Eq : z 
Kn : mmn > T pn Kn: 0= Pn > mn>Ta 
codata U @ where codata U a where 
Hi: U P> H,:Uaq@-a=p- 7 
€q = 


Hn: U Pn > Tr Hn : U @ > a = Pn > Tn 
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This formalizes the intuition that indexed data types can be thought of as enrich- 
ing constructors to carry around additional constraints that were available at 
their time of construction, whereas indexed codata types can be thought of as 
guarding methods with additional constraints that must be satisfied before an 
observation can be made. Two of the most basic examples of this simplification 
are for the type declarations which capture the notion of type equality as an 
indexed data or indexed codata type, which are defined and simplified like so: 


€ data Eq a b where || _ data Eq a b where 
Refl : Eq a a ~ Refl:a=b— Eqab 


é codata IfEq a b c where _ codata IfEq a b c where 
AssumeEq : IfEqa ac—c|| —— AssumeEq: IfEq a bc ~a=b—c 


With the above ability to simplify away type indexes, all of the presented com- 
pilation techniques are easily generalized to indexed data and codata types by 
composing them with €q. For practical programming example, consider the fol- 
lowing safe stack codata type indexed by its number of elements. 


codata Stack a where 
Pop : Stack (Succ a) — (Z, Stack a) 
Push : Stack a —> Z — Stack (Succ a) 


This stack type is safe in the sense that the Pop operation can only be applied to 
non-empty Stacks. We cannot compile this to a data type via T directly, because 
that translation does not apply to indexed codata types. However, if we first 
simplify the Stack type via Eq, we learn that we can replace the type of the 
Pop destructor with Pop : Stack a — Vb.a = Succ b — (Z, Stack b), whereas 
the Push destructor is already simple, so it can be left alone. That way, for any 
object s : Stack Zero, even though a client can initiate the observation s.Pop, it 
will never be completed since there is no way to choose a b and prove that Zero 
equals Succ b. Therefore, the net result of the combined To €q translation turns 
Stack into the following data type, after some further simplification: 


data Stack a where 
MKS : (Vb.a = Succ b > (Z, Stack b)) — (Z — Stack (Succ a)) — Stack a 


Notice how the constructor of this type has two fields; one for Pop and one for 
Push, respectively. However, the Pop operation is guarded by a proof obligation: 
the client can only receive the top integer and remaining stack if he/she proves 
that the original stack contains a non-zero number of elements. 


4 Compilation in Practice 


We have shown how data and codata are related through the use of two different 
core calculi. To explore how these ideas manifest in practice, we have imple- 
mented codata in a couple of settings. First, we extended Haskell with codata 
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n | Time(s) codata | Time(s) data | Allocs(bytes) codata | Allocs(bytes) data 
10000 | 0.02 0.01 10,143,608 6,877,048 

100000 | 0.39 0.27 495,593,464 463,025,832 
1000000 | 19.64 18.54 44,430,524,144 44,104,487,488 


Table 1. Fibonacci scaling tests for the GHC implementation 


in order to compare the lazy and codata approaches to demand-driven program- 
ming described in Sect. 2.2.3 Second, we have created a prototype language with 
indexed (co)data types to further explore the interaction between the compila- 
tion and target languages. The prototype language does not commit to a par- 
ticular evaluation strategy, typing discipline, or paradigm; instead this decision 
is made when compiling a program to one of several backends. The supported 
backends include functional ones—Haskell (call-by-need, static types), OCaml 
(call-by-value, static types), and Racket (call-by-value, dynamic types)—as well 
as the object-oriented JavaScript. The following issues of complex copattern 
matching and sharing applies to both implementations; the performance results 
on efficiency of memoized codata objects are tested with the Haskell extension 
for the comparison with conventional Haskell code. 


Complex Copattern Matching. Our implementations support nested copat- 
terns so that objects can respond to chains of multiple observations, even though 
\codata only provides flat copatterns. This extension does not enhance the lan- 
guage expressivity but allows more succinct programs [2]. A flattening step is 
needed to compile nested copatterns down to a core calculus, which has been 
explored in previous work by Setzer et al. [37] and Thibodeau [39] and imple- 
mented in OCaml by Regis-Gianas and Laforgue [33]. Their flattening algo- 
rithm requires copatterns to completely cover the object’s possible observations 
because the coverage information is used to drive flattening. This approach was 
refined and incorporated in a dependently typed setting by Cockx and Abel [11]. 
With our goal of supporting codata independently of typing discipline and cov- 
erage analysis, we have implemented the purely syntax driven approach to flat- 
tening found in [38]. For example, the prune function from Sect. 2.2 expands to: 


prune = Ax — At = 
{ Node — t.Node, 
Children — case x of 
o-— [I] 
— map (prune(x-1)) t.Children } 


Sharing. If codata is to be used instead of laziness for demand-driven program- 
ming, then it must have the same performance characteristics, which relies on 
sharing the results of computations [6]. To test this, we compare the performance 
of calculating streams of Fibonacci numbers—the poster child for sharing— 
implemented with both lazy list data types and a stream codata type in Haskell 


3 The GHC fork is at https://github.com/zachsully/ghc/tree/codata-macro. 
4 The prototype compiler is at https://github.com/zachsully /dl/tree/esop2019. 
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Syntax 

Values V x=- | {H > V} 

Terms 5 t, u,e = <- | t-H | {H > V} | letneea x = t in e 
Transformation 


Alt.H] = Aft].H 
AHH = t} = letneea x = Aft] in {H > x} 


Fig. 6. Memoization of °°? 


extended with codata. These tests, presented in Table 1, show the speed of the 
codata version is always slower in terms of run time and allocations than the lazy 
list version, but the difference is small and the two versions scale at the same 
rate. These performance tests are evidence that codata shares the same infor- 
mation when compiled to a call-by-need language; this we get for free because 
call-by-need data constructors—which codata is compiled into via {—memoize 
their fields. In an eager setting, it is enough to use memoized versions of delay 
and force, which are introduced by the call-by-value compilation described in 
Sect. 3.5. This sharing is confirmed by the OCaml and Racket backends of the 
prototype language which find the 100th Fibonacci in less than a second (a task 
that takes hours without sharing). 

As the object-oriented representative, the JavaScript backend is a compila- 
tion from data to codata using the visitor pattern presented in Sect. 3.2. Because 
codata remains codata (i.e. JavaScript objects), an optimization must be per- 
formed to ensure the same amount of sharing of codata as the other backends. 
The solution is to lift out the branches of a codata object, as shown in Fig. 6, 
where the call-by-need let-bindings can be implemented by delay and force in 
strict languages as usual. It turns out that this transformation is also needed in 
an alternative compilation technique presented by Regis-Gianas and Laforgue 
[33] where codata is compiled to functions, i.e. another form of codata. 


5 Related Work 


Our work follows in spirit of Amin et al.’s [3] desire to provide a minimal theory 
that can model type parameterization, modules, objects and classes. Another 
approach to combine type parameterization and modules is also offered by 1ML 
[36], which is mapped to System F. Amin et al.’s work goes one step further by 
translating System F to a calculus that directly supports objects and classes. 
Our approach differs in methodology: instead of searching for a logical foun- 
dation of a pre-determined notion of objects, we let the logic guide us while 
exploring what objects are. Even though there is no unanimous consensus that 
functional and object-oriented paradigms should be combined, there have been 
several hybrid languages for combining both styles of programming, including 
Scala, the Common Lisp Object System [8], Objective ML [34], and a proposed 
but unimplemented object system for Haskell [30]. 
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Arising out of the correspondence between programming languages, category 
theory, and universal algebras, Hagino [20] first proposed codata as an extension 
to ML to remedy the asymmetry created by data types. In the same way that 
data types represent initial F-algebras, codata types represent final F-coalgebras. 
These structures were implemented in the categorical programming language 
Charity [10]. On the logical side of the correspondence, codata arises naturally 
in the sequent calculus [15, 28,44] since it provides the right setting to talk about 
construction of either the provider (i.e. the term) or the client (i.e. the context) 
side of a computation, and has roots in classical [13,41] and linear logic [18,19]. 

In session-typed languages, which also have a foundation in linear logic, exter- 
nal choice can be seen as a codata (product) type dual to the way internal choice 
corresponds to a data (sum) type. It is interesting that similar problems arise in 
both settings. Balzer and Pfenning [7] discuss an issue that shows up in choos- 
ing between internal and external choice; this corresponds to choosing between 
data and codata, known as the expression problem. They [7] also suggest using 
the visitor pattern to remedy having external choice (codata) without internal 
choice (data) as we do in Sect. 3.2. Of course, session types go beyond codata 
by adding a notion of temporality (via linearity) and multiple processes that 
communicate over channels. 

To explore programming with coinductive types, Ancona and Zucca [4] and 
Jeannin et al. [26] extended Java and OCaml with regular cyclic structures; 
these have a finite representation that can be eagerly evaluated and fully stored 
in memory. A less restricted method of programming these structures was intro- 
duced by Abel et al. [1,2] who popularized the idea of programming by observa- 
tions, i.e. using copatterns. This line of work further developed the functionality 
of codata types in dependently typed languages by adding indexed codata types 
[40] and dependent copattern matching [11], which enabled the specification of 
bisimulation proofs and encodings of productive infinite objects in Agda. We 
build on these foundations by developing codata in practical languages. 

Focusing on implementation, Regis-Gianas and Laforgue [33] added codata 
with a macro transformation in OCaml. As it turns out, this macro defini- 
tion corresponds to one of the popular encodings of objects in the A-calculus 
[27], where codata/objects are compiled to functions from tagged messages to 
method bodies. This compilation scheme requires the use of GADTs for static 
type checking, and is therefore only applicable to dynamically typed languages 
or the few statically typed languages with expressive enough type systems like 
Haskell, OCaml, and dependently typed languages. Another popular technique 
for encoding codata/objects is presented in [31], corresponding to a class-based 
organization of dynamic dispatch [21], and is presented in this paper. This tech- 
nique compiles codata/objects to products of methods, which has the advantage 
of being applicable in a simply-typed setting. 


6 Conclusion 


We have shown here how codata can be put to use to capture several practical 
programming idioms and applications, besides just modeling infinite structures. 
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In order to help incorporate codata in today’s programming languages, we have 
shown how to compile between two core languages: one based on the familiar 
notion of data types from functional languages such as Haskell and OCaml, 
and the other one, based on the notion of a structure defined by reactions to 
observations [1]. This paper works toward the goal of providing common ground 
between the functional and object-oriented paradigms; as future work, we would 
like to extend the core with other features of full-fledged functional and object- 
oriented languages. A better understanding of codata clarifies both the theory 
and practice of programming languages. Indeed, this work is guiding us in the 
use of fully-extensional functions for the compilation of Haskell programs. The 
design is motivated by the desire to improve optimizations, in particular the 
ones relying on the “arity” of functions, to be more compositional and work 
between higher-order abstractions. It is interesting that the deepening of our 
understanding of objects is helping us in better compiling functional languages! 
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Abstract. Software frequently converts data from one representation 
to another and vice versa. Naively specifying both conversion directions 
separately is error prone and introduces conceptual duplication. Instead, 
bidirectional programming techniques allow programs to be written which 
can be interpreted in both directions. However, these techniques often 
employ unfamiliar programming idioms via restricted, specialised combi- 
nator libraries. Instead, we introduce a framework for composing bidirec- 
tional programs monadically, enabling bidirectional programming with 
familiar abstractions in functional languages such as Haskell. We demon- 
strate the generality of our approach applied to parsers/printers, lenses, 
and generators/predicates. We show how to leverage compositionality 
and equational reasoning for the verification of round-tripping properties 
for such monadic bidirectional programs. 


1 Introduction 


A bidirectional transformation (BX) is a pair of mutually related mappings 
between source and target data objects. A well-known example solves the view- 
update problem |2] from relational database design. A view is a derived database 
table, computed from concrete source tables by a query. The problem is to map 
an update of the view back to a corresponding update on the source tables. This 
is captured by a bidirectional transformation. The bidirectional pattern is found 
in a broad range of applications, including parsing [17,30], refactoring [31], code 
generation [21,27], and model transformation [32] and XML transformation [25]. 

When programming a bidirectional transformation, one can separately con- 
struct the forwards and backwards functions. However, this approach duplicates 
effort, is prone to error, and causes subsequent maintenance issues. These prob- 
lems can be avoided by using specialised programming languages that generate 
both directions from a single definition [13,16,33], a discipline known as bidirec- 
tional programming. 

The most well-known language family for BX programming is lenses [13]. 
A lens captures transformations between sources S and views V via a pair of 
functions get : S — V and put: V > S — S. The get function extracts a view 
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L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 147-175, 2019. 
https: / /doi.org/10.1007/978-3-030-17184-1_6 


148 L. Xia et al. 


from a source and put takes an updated view and a source as inputs to produce 
an updated source. The asymmetrical nature of get and put makes it possible 
for put to recover some of the source data that is not present in the view. In 
other words, get does not have to be injective to have a corresponding put. 
Bidirectional transformations typically respect round-tripping laws, captur- 
ing the extent to which the transformations preserve information between the 
two data representations. For example, well-behaved lenses [5,13] should satisfy: 


put (get s) s = s get (put v s) =v 


Lens languages are typically designed to enforce these properties. This focus on 
unconditional correctness inevitably leads to reduced practicality in program- 
ming: lens combinators are often stylised and disconnected from established 
programming idioms. In this paper, we instead focus on expressing bidirectional 
programs directly, using monads as an interface for sequential composition. 
Monads are a popular pattern [35] (especially in Haskell) which combinator 
libraries in other domains routinely exploit. Introducing monadic composition to 
BX programming significantly expands the expressiveness of BX languages and 
opens up a route for programmers to explore the connection between BX pro- 
gramming and mainstream uni-directional programming. Moreover, it appears 
that many applications of bidirectional transformations (e.g., parsers and print- 
ers [17|) do not share the lens get/put pattern, and as a result have not been 
sufficiently explored. However, monadic composition is known to be an effective 
way to construct at least one direction of such transformations (e.g., parsers). 


Contributions. In this paper, we deliberately avoid the well-tried approach of 
specialised lens languages, instead exploring a novel point in the BX design space 
based on monadic programming, naturally reusing host language constructs. 
We revisit lenses, and two more bidirectional patterns, demonstrating how they 
can be subject to monadic programming. By being uncompromising about the 
monad interface, we expose the essential ideas behind our framework whilst 
maximising its utility. The trade off with our approach is that we can no longer 
enforce correctness in the same way as conventional lenses: our interface does 
not rule out all non-round-tripping BXs. We tackle this issue by proposing a 
new compositional reasoning framework that is flexible enough to characterise a 
variety of round-tripping properties, and simplifies the necessary reasoning. 
Specifically, we make the following contributions: 


— We describe a method to enable monadic composition for bidirectional pro- 
grams (Sect.3). Our approach is based on a construction which generates a 
monadic profunctor, parameterised by two application-specific monads which 
are used to generate the forward and backward directions. 

— To demonstrate the flexibility of our approach, we apply the above method 
to three different problem domains: parsers/printers (Sects. 3 and 4), lenses 
(Sect.5), and generators/predicates for structured data (Sect.6). While the 
first two are well-explored areas in the bidirectional programming literature, 
the third one is a completely new application domain. 
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— We present a scalable reasoning framework, capturing notions of composition- 
ality for bidirectional properties (Sect. 4). We define classes of round-tripping 
properties inherent to bidirectionalism, which can be verified by following sim- 
ple criteria. These principles are demonstrated with our three examples. We 
include some proofs for illustration in the paper. The supplementary mate- 
rial [12] contains machine-checked Coq proofs for the main theorems. 

An extended version of this manuscript [36] includes additional definitions, 
proofs, and comparisons in its appendices. 

— We have implemented these ideas as Haskell libraries [12], with two wrappers 
around attoparsec for parsers and printers, and QuickCheck for generators and 
predicates, showing the viability of our approach for real programs. 


We use Haskell for concrete examples, but the programming patterns can be 
easily expressed in many functional languages. We use the Haskell notation of 
assigning type signatures to expressions via an infix double colon “::”. 


1.1 Further Examples of BX 


We introduced lenses briefly above. We now introduce the other two examples 
used in this paper: parsers/printers and generators/predicates. 


Parsing and printing. Programming language tools (such as interpreters, com- 
pilers, and refactoring tools) typically require two intimately linked components: 
parsers and printers, respectively mapping from source code to ASTs and back. 
A simple implementation of these two functions can be given with types: 


parser :: String — AST printer :: AST — String 


Parsers and printers are rarely actual inverses to each other, but instead typically 
exhibit a variant of round-tripping such as: 


parseroprinteroparser = parser printeroparseroprinter = printer 


The left equation describes the common situation that parsing discards informa- 
tion about source code, such as whitespace, so that printing the resulting AST 
does not recover the original source. However, printing retains enough informa- 
tion such that parsing the printed output yields an AST which is equivalent to 
the AST from parsing the original source. The right equation describes the dual: 
printing may map different ASTs to the same string. For example, printed code 
1+2+3 might be produced by left- and right-associated syntax trees. 

For particular AST subsets, printing and parsing may actually be left- or 
right- inverses to each other. Other characterisations are also possible, e.g., with 
equivalence classes of ASTs (accounting for reassociations). Alternatively, parsers 
and printers may satisfy properties about the interaction of partially-parsed 
inputs with the printer and parser, e.g., if parser :: String — (AST, String): 


(let (x, s’) = parser s in parser ((printer x) ++ s’)) = parser s 
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Thus, parsing and printing follows a pattern of inverse-like functions which does 
not fit the lens paradigm. The pattern resembles lenses between a source (source 
code) and view (ASTs), but with a compositional notion for the source and 
partial “gets” which consume some of the source, leaving a remainder. 

Writing parsers and printers by hand is often tedious due to the redundancy 
implied by their inverse-like relation. Thus, various approaches have been pro- 
posed for reducing the effort of writing parsers/printers by generating both from 
a common definition [17, 19,30]. 


Generating and checking. Property-based testing (e.g., QuickCheck) [10] 
expresses program properties as executable predicates. For instance, the fol- 
lowing property checks that an insertion function insert, given a sorted list—as 
checked by the predicate isSorted :: [Int] — Bool—produces another sorted 
list. The combinator = > represents implication for properties. 

propInsert :: Int — [Int] — Property 

propInsert val list = isSorted list = > isSorted (insert val list) 


To test this property, a testing framework generates random inputs for val and 
list. The implementation of => applied here first checks whether list is 
sorted, and if it is, checks that insert val list is sorted as well. This process 
is repeated with further random inputs until either a counterexample is found 
or a predetermined number of test cases pass. 

However, this naive method is inefficient: many properties such as propInsert 
have preconditions which are satisfied by an extremely small fraction of inputs. In 
this case, the ratio of sorted lists among lists of length n is inversely proportional 
to n!, so most generated inputs will be discarded for not satisfying the isSorted 
precondition. Such tests give no information about the validity of the predicate 
being tested and thus are prohibitively inefficient. 

When too many inputs are being discarded, the user must instead supply 
the framework with custom generators of values satisfying the precondition: 
genSorted :: Gen [Int]. 

One can expect two complementary properties of such a generator. A genera- 
tor is sound with respect to the predicate isSorted if it generates only values sat- 
isfying isSorted; soundness means that no tests are discarded, hence the tested 
property is better exercised. A generator is complete with respect to isSorted 
if it can generate all satisfying values; completeness ensures the correctness of 
testing a property with isSorted as a precondition, in the sense that if there 
is a counterexample, it will be eventually generated. In this setting of testing, 
completeness, which affects the potential adequacy of testing, is arguably more 
important than soundness, which affects only efficiency. 

It is clear that generators and predicates are closely related, forming a pat- 
tern similar to that of bidirectional transformations. Given that good generators 
are usually difficult to construct, the ability to extract both from a common 
specification with bidirectional programming is a very attractive alternative. 


Roadmap. We begin by outlining a concrete example of our monadic approach 
via parsers and printers (Sect. 2), before explaining the general approach of using 
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monadic profunctors to structure bidirectional programs (Sect. 3). Section 4 then 
presents a compositional reasoning framework for monadic bidirectional pro- 
grams, with varying degrees of strength adapted to different round-tripping 
properties. We then replay the developments of the earlier sections to define 
lenses as well as generators and predicates in Sects. 5 and 6. 


2 Monadic Bidirectional Programming 


A bidirectional parser, or biparser, combines both a parsing direction and print- 
ing direction. Our first novelty here is to express biparsers monadically. 

In code samples, we use the Haskell pun of naming variables after their types, 
e.g., a variable of some abstract type v will also be called v. Similarly, for some 
type constructor m, a variable of type m v will be called mv. A function u —> m v 
(a Kleisli arrow for a monad m) will be called kv. 


Monadic parsers. The following data type provides the standard way to describe 
parsers of values of type v which may consume only part of the input string: 


data Parser v = Parser { parse :: String — (v, String) } 


It is well-known that such parsers are monadic [35], i.e., they have a notion of 
monadic sequential composition embodied by the interface: 


instance Monad Parser where 
(>>=) :: Parser v — (v — Parser w) — Parser w 
return :: v — Parser v 


The sequential composition operator (>>=), called bind, describes the scheme 
of constructing a parser by sequentially composing two sub-parsers where the 
second depends on the output of the first; a parser of w values is made up of a 
parser of v and a parser of w that depends on the previously parsed v. Indeed, 
this is the implementation given to the monadic interface: 


pv >>= kw = Parser (As — let (v, s’) = parse pv s in parse (kw v) s’) 
return v = Parser (As — (v, s)) 


Bind first runs the parser pv on an input string s, resulting in a value v which is 
used to create the parser kw v, which is in turn run on the remaining input s’ 
to produce parsed values of type w. The return operation creates a trivial parser 
for any value v which does not consume any input but simply produces v. 

In practice, parsers composed with (>>=) often have a relationship between 
the output types of the two operands: usually that the former “contains” the 
latter in some sense. For example, we might parse an expression and compose 
this with a parser for statements, where statements contain expressions. This 
relationship will be useful later when we consider printers. 

As a shorthand, we can discard the remaining unparsed string of a parser 
using projection, giving a helper function parser :: Parser v > (String — v). 
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Monadic printers. Our goal is to augment parsers with their inverse printer, 
such that we have a monadic type Biparser which provides two complementary 
(bi-directional) transformations: 


parser :: Biparser v — (String — v) 
printer :: Biparser v — (v — String) 


However, this type of printer v — String (shown also in Sect. 1.1) cannot form 
a monad because it is contravariant in its type parameter v. Concretely, we 
cannot implement the bind (>>=) operator for values with types of this form: 


-- Failed attempt 
bind :: (v > String) > (v > (w > String)) —> (w > String) 
bind pv kw = Aw —> let v = (??) in pv v ++ kw v w 


We are stuck trying to fill the hole (??) as there is no way to get a value of type v 
to pass as an argument to pv (first printer) and kw (second printer which depends 
on a v). Subsequently, we cannot construct a monadic biparser by simply taking 
a product of the parser monad and v — String and leveraging the result that 
the product of two monads is a monad. 
But what if the type variables of bind were related by containment, such that 
v is contained within w and thus we have a projection w — v? We could use this 
projection to fill the hole in the failed attempt above, defining a bind-like operator: 
bind’ :: (w— v) > (v => String) —> (v — (w — String)) —> (w —> String) 
bind’ from pv kw = Aw — let v = from w in pv v ++ kw v w 


This is closer to the monadic form, where from :: w — v resolves the difficulty 
of contravariance by “contextualizing” the printers. Thus, the first printer is no 
longer just “a printer of v”, but “a printer of v extracted from w”. In the context 
of constructing a bidirectional parser, having such a function to hand is not an 
unrealistic expectation: recall that when we compose two parsers, typically the 
values of the first parser for v are contained within the values returned by the 
second parser for w, thus a notion of projection can be defined and used here to 
recover a v in order to build the corresponding printer compositionally. 

Of course, this is still not a monad. However, it suggests a way to generate a 
monadic form by putting the printer and the contextualizing projection together, 
(w > v, v — String) and fusing them into (w — (v, String)). This has 
the advantage of removing the contravariant occurence of v, yielding a data type: 


data Printer w v = Printer { print :: w — (v, String) } 


If we fix the first parameter type w, then the type Printer w of printers for w 
values is indeed monadic, combining a reader monad (for some global read-only 
parameter of type w) and a writer monad (for strings), with implementation: 


instance Monad (Printer w) where 
return :: v — Printer w v 
return = Av —> Printer (A_ > (v, "")) 


(>>=) :: Printer w v —> (v —> Printer w t) — Printer wt 
pv >>= kt = Printer (Aw — let (v, s) = print pv w 
(t, s?) = print (kt v) w in (t, s ++ s’)) 
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The printer return v ignores its input and prints nothing. For bind, an input w 
is shared by both printers and the resulting strings are concatenated. 

We can adapt the contextualisation of a printer by the following operation 
which amounts to pre-composition, witnessing the fact that Printer is a con- 
travariant functor in its first parameter: 


comap :: (w — w’) — Printer w? v — Printer w v 
comap from (Printer f) = Printer (f o from) 


2.1 Monadic Biparsers 


So far so good: we now have a monadic notion of printers. However, our goal is 
to combine parsers and printers in a single type. Since we have two monads, we 
use the standard result that a product of monads is a monad, defining biparsers: 
data Biparser u v = Biparser { parse :: String —> (v, String) 
, print :: u —> (v, String) } 


By pairing parsers and printers we have to unify their covariant parameters. 
When both the type parameters of Biparser are the same it is easy to interpret 
this type: a biparser Biparser v v is a parser from strings to v values and 
printer from v values to strings. We refer to biparsers of this type as aligned 
biparsers. What about when the type parameters differ? A biparser of type 
Biparser u v provides a parser from strings to v values and a printer from u 
values to strings, but where the printers can compute v values from u values, 
i.e., u is some common broader representation which contains relevant v-typed 
subcomponents. A biparser Biparser u v can be thought of as printing a certain 
subtree v from the broader representation of a syntax tree u. 

The corresponding monad for Biparser is the product of the previous two 
monad definitions for Parser and Printer, allowing both to be composed sequen- 
tially at the same time. To avoid duplication we elide the definition here which 
is shown in full in Appendix A of the extended version [36] 

We can also lift the previous notion of comap from printers to biparsers, which 
gives us a way to contextualize a printer: 


comap :: (u — u’) — Biparser u’? v — Biparser u v 
comap f (Biparser parse print) = Biparser parse (print o f) 


upon :: Biparser u’ v — (u — u’) — Biparser u v 
upon = flip comap 


In the rest of this section, we use the alias “upon” for comap with flipped 
parameters where we read p ‘upon‘ subpart as applying the printer of 
p :: Biparser u’ v on a subpart of an input of type u calculated by 
subpart :: u — u’, thus yielding a biparser of type Biparser u v. 


An example biparser. Let us write 
a biparser, string :: Biparser String String, for strings which are prefixed 
by their length and a space. For example, the following unit tests should be 
true: 
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test] parse string "6_lambda_calculus” == ("lambda”, " calculus”) 
test2 = print string "SKI” == ("SKI”, "3 SKI") 


We start by defining a primitive biparser of single characters as: 


char :: Biparser Char Char 
char = Biparser (A (c : s) —> (c, s)) (A c > (c, [c])) 


A character is parsed by deconstructing the source string into its head and tail. 
For brevity, we do not handle the failure associated with an empty string. A 
character c is printed as its single-letter string (a singleton list) paired with c. 

Next, we define a biparser int for an integer followed by a single space. An 
auxiliary biparser digits (on the right) parses an integer one digit at a time into 
a string. Note that in Haskell, the do-notation statement “d + char ‘upon‘ head” 
desugars to “char ‘upon’ head >>= A d — ...” which uses (>>=) and a func- 
tion binding d in the scope of the rest of the desugared block. 


int :: Biparser Int Int digits :: Biparser String String 
int = do digits = do 

ds + digits ‘upon‘ printedInt d + char ‘upon‘ head 

return (read ds) if isDigit d then do 

where igits + digits ‘upon‘ tail 


printedInt n = show n ++ "_” return (d : igits) 
else if d ==’ ’ then return "_” 


else error "Expected_digit_or_space” 


On the right, digits extracts a String consisting of digits followed by a single 
space. As a parser, it parses a character (char ‘upon‘ head); if it is a digit 
then it continues parsing recursively (digits ‘upon‘ tail) appending the first 
character to the result (d : igits). Otherwise, if the parsed character is a space 
the parser returns ".." . As a printer, digits expects a non-empty string of the 
same format; ‘upon‘ head extracts the first character of the input, then char 
prints it and returns it back as d; if it is a digit, then ‘upon‘ tail extracts 
the rest of the input to print recursively. If the character is a space, the printer 
returns a space and terminates; otherwise (not digit or space) the printer throws 
an error. 

On the left, the biparser int uses read to convert an input string of digits 
(parsed by digits) into an integer, and printedInt to convert an integer to an 
output string printed by digits. A safer implementation could return the Maybe 
type when parsing but we keep things simple here for now. 

After parsing an integer n, we can parse the string following it by iterating n 
times the biparser char. This is captured by the replicateBiparser combinator 
below, defined recursively like digits but with the termination condition given 
by an external parameter. To iterate n times a biparser pv: if n == @ , there is 
nothing to do and we return the empty list; otherwise for n > @, we run pv once 
to get the head v, and recursively iterate n-1 times to get the tail vs. 

Note that although not reflected in its type, replicateBiparser n pv 
expects, as a printer, a list 1 of length n: if n == ð , there is nothing to print; if 
n > @, ‘upon‘ head extracts the head of 1 to print it with pv, and ‘upon‘ tail 
extracts its tail, of length n-1, to print it recursively. 
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replicateBiparser :: Int — Biparser u v > Biparser [u] [v] 
replicateBiparser ð pv = return [] 
replicateBiparser n pv = do 

v «+ pv ‘upon’ head 

vs + (replicateBiparser (n - 1) pv) ‘upon‘ tail 

return (v : vs) 


(akin to replicateM from Haskell’s standard library). We can now fulfil our task: 


string :: Biparser String String 
string = int ‘upon’ length >>= An — replicateBiparser n char 


Interestingly, if we erase applications of upon, i.e., we substitute every expression 
of the form py ‘upon‘ f with py and ignore the second parameter of the types, 
we obtain what is essentially the definition of a parser in an idiomatic style for 
monadic parsing. This is because ‘upon‘ f is the identity on the parser compo- 
nent of Biparser. Thus the biparser code closely resembles standard, idiomatic 
monadic parser code but with “annotations” via upon expressing how to apply 
the backwards direction of printing to subparts of the parsed string. 

Despite its simplicity, the syntax of length-prefixed strings is notably context- 
sensitive. Thus the example makes crucial use of the monadic interface for bidi- 
rectional programming: a value (the length) must first be extracted to dynam- 
ically delimit the string that is parsed next. Context-sensitivity is standard for 
parser combinators in contrast with parser generators, e.g., Yacc, and applicative 
parsers, which are mostly restricted to context-free languages. By our monadic 
BX approach, we can now bring this power to bear on bidirectional parsing. 


3 A Unifying Structure: Monadic Profunctors 


The biparser examples of the previous section were enabled by both the monadic 
structure of Biparser and the comap operation (also called upon, with flipped 
arguments). We describe a type as being a monadic profunctor when it has both 
a monadic structure and a comap operation (subject to some equations). The 
notion of a monadic profunctor is general, but it characterises a key class of 
structures for bidirectional programs, which we explain here. Furthermore, we 
show a construction of monadic profunctors from pairs of monads which elicits 
the necessary structure for monadic bidirectional programming in the style of 
the previous section. 


Profunctors. In Sect.2.1, biparsers were defined by a data type with two 
type parameters (Biparser u v) which is functorial and monadic in the sec- 
ond parameter and contravariantly functorial in the first parameter (provided 
by the comap operation). In standard terminology, a two-parameter type p which 
is functorial in both its type parameters is called a bifunctor. In Haskell, the term 
profunctor has come to mean any bifunctor which is contravariant in the first 
type parameter and covariant in the second.! This differs slightly from the stan- 
dard category theory terminology where a profunctor is a bifunctor F : D9? xC — 


1 http: //hackage.haskell.org//profunctors/docs/Data-Profunctor.html. 
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Set. This corresponds to the Haskell community’s use of the term “profunctor” 
if we treat Haskell in an idealised way as the category of sets. 

We adopt this programming-oriented terminology, capturing the comap opera- 
tion via a class Profunctor. In the preceding section, some uses of comap involved 
a partial function, e.g., comap head. We make the possibility of partiality explicit 
via the Maybe type, yielding the following definition. 


Definition 1. A binary data type is a profunctor if it is a contravariant functor 
in its first parameter and covariant functor in its second, with the operation: 


class ForallF Functor p = Profunctor p where 
comap :: (u — Maybe v) > pu’ v —>puv 


which should obey two laws: 


comap Just = id comap (f >=> g) = comap f o comap g 


where (>=>) :: (a — Maybe b) — (b — Maybe c) — (a — Maybe c) com- 
poses partial functions (left-to-right), captured by Kleisli arrows of the Maybe 
monad. 

The constraint ForallF Functor p captures a universally quantified con- 
straint [6]: for all types u then p u has an instance of the Functor class.” 

The requirement for comap to take partial functions is in response to 
the frequent need to restrict the domain of bidirectional transformations. In 
combinator-based approaches, combinators typically constrain bidirectional pro- 
grams to be bijections, enforcing domain restrictions by construction. Our more 
flexible approach requires a way to include such restrictions explicitly, hence 
comap. 

Since the contravariant part of the bifunctor applies to functions of type 
u — Maybe u’, the categorical analogy here is more precisely a profunctor F : 
Cr°? xC — Set where Cr is the Kleisli category of the partiality (Maybe) monad. 


Definition 2. A monadic profunctor is a profunctor p (in the sense of 
Definition 1) such that p u is a monad for all u. In terms of type class con- 
straints, this means there is an instance Profunctor p and for all u there is a 
Monad (p u) instance. Thus, we represent monadic profunctors by the following 
empty class (which inherits all its methods from its superclasses): 


class (Profunctor p, ForallF Monad p) = Profmonad p 


Monadic profunctors must obey the following laws about the interaction between 
profunctor and monad operations: 


comap f (return y) = return y 
comap f (py >>= kz) comap f py >>= (A y — comap f (kz y)) 


2 As of GHC 8.6, the QuantifiedConstraints extension allows universal quantification 
in constraints, written as forall u. Functor (p u), but for simplicity we use the 
constraint constructor ForallF from the constraints package: http: //hackage.haskell. 
org/package/constraints. 
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(for all f :: u — Maybe v, py :: p v y, kz :: y > p v z). These laws are 
equivalent to saying that comap lifts (partial) functions into monad morphisms. 
In Haskell, these laws are obtained for free by parametricity [34]. This means 
that every contravariant functor and monad is in fact a monadic profunctor, 
thus the following universal instance is lawful: 


instance (Profunctor p, ForallF Monad p) = Profmonad p 


Corollary 1. Biparsers form a monadic profunctor as there is an instance of 
Monad (P u) and Profunctor p satisfying the requisite laws. 


Lastly, we introduce a useful piece of terminology (mentioned in the previous 
section on biparsers) for describing values of a profunctor of a particular form: 


Definition 3. A value p :: P u v of a profunctor P is called aligned if u = v. 


3.1 Constructing Monadic Profunctors 


Our examples (parsers/printers, lenses, and generators/predicates) share 
monadic profunctors as an abstraction, making it possible to write different 
kinds of bidirectional transformations monadically. Underlying these definitions 
of monadic profunctors is a common structure, which we explain here using 
biparsers, and which will be replayed in Sect.5 for lenses and Sect.6 for bigen- 
erators. 

There are two simple ways in which a covariant functor m (resp. a monad) 
gives rise to a profunctor (resp. a monadic profunctor). The first is by con- 
structing a profunctor in which the contravariant parameter is discarded, i.e., 
p u v = m v; the second is as a function type from the contravariant parameter u 
tom v,i.e.,p u v = u — m v. These are standard mathematical constructions, 
and the latter appears in the Haskell profunctors package with the name Star. 
Our core construction is based on these two ways of creating a profunctor, which 
we call Fwd and Bwd respectively: 

data Fwd m u v = Fwd { unFwd :: mv } -- ignore contrv. parameter 
data Bwd m u v = Bwd { unBwd :: u + m v } -- maps from contrv. parameter 


The naming reflects the idea that these two constructions will together capture 
a bidirectional transformation and are related by domain-specific round-tripping 
properties in our framework. Both Fwd and Bwd map any functor into a profunctor 
by the following type class instances: 


instance Functor m = Functor (Fwd m u) where 
fmap f (Fwd x) = Fwd (fmap f x) 

instance Functor m = Profunctor (Fwd m) where 
comap f (Fwd x) = Fwd x 


instance Functor m = Functor (Bwd m u) where 
fmap f (Bwd x) = Bwd ((fmap f) o x) 

instance (Monad m, MonadPartial m) = Profunctor (Bwd m) where 
comap f (Bwd x) = Bwd ((toFailure o f) >=> x) 
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There is an additional constraint here for Bwd, enforcing that the monad m is a 
member of the MonadPartial class which we define as: 

class MonadPartial m where toFailure :: Maybe a > ma 


This provides an interface for monads which can internalise a notion of failure, 
as captured at the top-level by Maybe in comap. 
Furthermore, Fwd and Bwd both map any monad into a monadic profunctor: 


instance Monad m instance Monad m 
=> Monad (Fwd m u) where => Monad (Bwd m u) where 
return x = Fwd (return x) return x = Bwd (A_ — return x) 
Fwd py >>= kz = Bwd my >>= kz = Bwd 
Fwd (py >>= unFwd o kz) (Au — my u >>= (Ay — unBwd (kz y) u)) 


The product of two monadic profunctors is also a monadic profunctor. This 
follows from the fact that the product of two monads is a monad and the product 
of two contravariant functors is a contravariant functor. 


data (:*:) p qu v = (:*:) { pfst :: pu v, psnd:: quv} 


instance (Monad (p u), Monad (q u)) = Monad ((p :*: q) u) where 
return y = return y :*: return y 
py :*: qy >>= kz = (py >>= pfst o kz) :*: (qy >>= psnd o kz) 


instance (ForallF Functor (p :*: q), Profunctor p, Profunctor q) 
= Profunctor (p :*: q) where 
comap f (py :*: qy) = comap f py :*: comap f qy 


3.2 Deriving Biparsers as Monadic Profunctor Pairs 


We can redefine biparsers in terms of the above data types, their instances, and 
two standard monads, the state and writer monads: 


type State s a = s — (a, s) 
type WriterT wm a =m (a, w) 
type Biparser = Fwd (State String) :*: Bwd (WriterT Maybe String) 


The backward direction composes the writer monad with the Maybe monad using 
WriterT (the writer monad transformer, equivalent to composing two monads 
with a distributive law). Thus the backwards component of Biparser corresponds 
to printers (which may fail) and the forwards component to parsers: 


Bwd (WriterT Maybe String) u v 
Fwd (State String) u v 


u — Maybe (v, String) 
String — (v, String) 


pas 
pas 


For the above code to work in Haskell, the State and WriterT types need to be 
defined via either a data type or newtype in order to allow type class instances on 
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partially applied type constructors. We abuse the notation here for simplicity but 
define smart constructors and deconstructors for the actual implementation:® 


parse :: Biparser u v — (String — (v, String)) 
print :: Biparser u v —> (u — Maybe (v, String)) 
mkBP :: (String — (v, String)) — (u — Maybe (v, String)) — Biparser u v 


The monadic profunctor definition for biparsers now comes for free from the 
constructions in Sect. 3.1 along with the following instance of MonadPartial for 
the writer monad transformer with the Maybe monad: 


instance Monoid w = MonadPartial (WriterT w Maybe) where 
toFailure Nothing = WriterT Nothing 
toFailure (Just a) = WriterT (Just (a, mempty)) 


In a similar manner, we will use this monadic profunctor construction to 
define monadic bidirectional transformations for lenses (Sect.5) and bigener- 
ators (Sect. 6). 

The example biparsers from Sect. 2.1 can be easily redefined using the struc- 
ture here. For example, the primitive biparser char becomes: 


char :: Biparser Char Char 
char = mkBP (A (c : s) — (c, s)) (A c —> Just (c, [c])) 


Codec library. The codec library [8] provides a general type for bidirectional 
programming isomorphic to our composite type Fwd r :*: Bwd w: 


data Codec r wc a = Codec { codecIn :: r a, codecOut :: c ~ wa } 


Though the original codec library was developed independently, its current form 
is a result of this work. In particular, we contributed to the package by general- 
ising its original type (codecOut :: c — w ()) to the one above, and provided 
Monad and Profunctor instances to support monadic bidirectional programming 
with codecs. 


4 Reasoning about Bidirectionality 


So far we have seen how the monadic profunctor structure provides a way to 
define biparsers using familiar operations and syntax: monads and do-notation. 
This structuring allows both the forwards and backwards components of a 
biparser to be defined simultaneously in a single compact definition. 

This section studies the interaction of monadic profunctors with the round- 
tripping laws that relate the two components of a bidirectional program. For 
every bidirectional transformation we can define dual properties: backward round 
tripping (going backwards-then-forwards) and forward round tripping (going 
forwards-then-backwards). In each BX domain, such properties also capture 


3 Smart constructors (and dually smart deconstructors) are just functions that hide 
boilerplate code for constructing and deconstructing data types. 
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additional domain-specific information flow inherent to the transformations. We 
use biparsers as the running example. We then apply the same principles to our 
other examples in Sects. 5 and 6. For brevity, we use Bp as an alias for Biparser. 


Definition 4. A biparser p :: Bp u uis backward round tripping if for all x :: u 
ands, s?’ :: String then (recalling that print p :: u — Maybe (v, String)): 


fmap snd (print p x) = Just s == parse p (s ++ s?) = (x, s’). 


That is, if a biparser p when used as a printer (going backwards) on an input 
value x produces a string s, then using p as a parser on a string with prefix s 
and suffix s’ yields the original input value x and the remaining input s’. 

Note that backward round tripping is defined for aligned biparsers (of type 
Bp u u) since the same value x is used as both the input of the printer (typed by 
the first type parameter of Bp) and as the expected output of the parser (typed 
by the second type parameter of Bp). 

The dual property is forward round tripping: a source string s is parsed (going 
forwards) into some value x which when printed produces the initial source s: 


Definition 5. A biparser p :: Bp u u is forward round tripping if for every 
x :: uand s :: String we have that: 


parse p s = (x, "") ==> fmap snd (print p x) = Just s 


Proposition 1. The biparser char :: Bp Char Char (Sect.3.2) is both back- 
ward and forward round tripping. Proof by expanding definitions and algebraic 
reasoning. 


Note, in some applications, forward round tripping is too strong. Here it 
requires that every printed value corresponds to at most one source string. This 
is often not the case as ASTs typically discard formatting and comments so that 
pretty-printed code is lexically different to the original source. However, different 
notions of equality enable more reasonable forward round-tripping properties. 

Although one can check round-tripping properties of biparsers by expand- 
ing their definitions and the underlying monadic profunctor operations, a more 
scalable approach is provided if a round-tripping property is compositional with 
respect to the monadic profunctor operations, i.e., if these operations preserve 
the property. Compositional properties are easier to enforce and check since only 
the individual atomic components need round-tripping proofs. Such properties 
are then guaranteed “by construction” for programs built from those components. 


4.1 Compositional Properties of Monadic Bidirectional 
Programming 


Let us first formalize compositionality as follows. A property R over a monadic 
profunctor P is a family of subsets RY of P u v indexed by types u and v. 
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Definition 6. A property R over a monadic profunctor P is compositional if the 
monadic profunctor operations are closed over R, i.e., the following conditions 
hold for all types u, v, w: 


1. For all x :: v, (return x) € Ry (comp-return) 
2. For all p :: P u vandk :: v > P uw, 


(p E€ Ry) A (Yv. (k v) € Ry) = > (p >>= k) € Ry  (comp-bind) 
3. For all p :: P u? vandf :: u — Maybe v’, 


pe RY => (comap f p) € Ry (comp-comap) 


Unfortunately for biparsers, forward and backward round tripping as defined 
above are not compositional: return is not backward round tripping and >>= 
does not preserve forward round tripping. Furthermore, these two properties are 
restricted to biparsers of type Bp u u (i.e., aligned biparsers) but composition- 
ality requires that the two type parameters of the monadic profunctor can differ 
in the case of comap and (>>=). This suggests that we need to look for more 
general properties that capture the full gamut of possible biparsers. 

We first focus on backward round tripping. Informally, backward round trip- 
ping states that if you print (going backwards) and parse the resulting out- 
put (going forwards) then you get back the initial value. However, in a general 
biparser p :: Bp u v, the input type of the printer u differs from the output type 
of the parser v, so we cannot compare them. But our intent for printers is that 
what we actually print is a fragment of u, a fragment which is given as the output 
of the printer. By thus comparing the outputs of both the parser and printer, 
we obtain the following variant of backward round tripping: 


Definition 7. A biparser p :: Bp u v is weak backward round tripping if for all 
Xi u, y x v, and s,s’ :: String then: 


print p x = Just (y, s) = parse p (s ++ Ss’) = (y, s’) 


Removing backward round tripping’s restriction to aligned biparsers and using 
the result y :: v of the printer gives us a property that is compositional: 


Proposition 2. Weak backward round tripping of biparsers is compositional. 
Proposition 3. The primitive biparser char is weak backward round tripping. 


Corollary 2. Propositions 2 & 3 imply string is weak backward round trip- 
ping. 


This property is “weak” as it does not constrain the relationship between the 
input u of the printer and its output v. In fact, there is no hope for a compo- 
sitional property to do so: the monadic profunctor combinators do not enforce 
a relationship between them. However, we can regain compositionality for the 
stronger backward round-tripping property by combining the weak composi- 
tional property with an additional non-compositional property on the relation- 
ship between the printer’s input and output. This relationship is represented 
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by the function that results from ignoring the printed string, which amounts to 
removing the main effect of the printer. Thus we call this operation a purifica- 
tion: 


purify :: forall u v. Bp u v — u — Maybe v 
purify p u = fmap fst (print p u) 


Ultimately, when a biparser is aligned (p :: Bp u u) we want an input to the 
printer to be returned in its output, i.e, purify p should equal Ax — Just x. 
If this is the case, we recover the original backward round tripping property: 


Theorem 1. If p :: P u u is weak backward round tripping, and for all x :: u. 
purify p x = Just x, then p is backward round tripping. 


Thus, for any biparser p, we can get backward round tripping by proving that 
its atomic subcomponents are weak backward round tripping, and proving that 
purify p x = Just x. The interesting aspect of the purification condition here 
is that it renders irrelevant the domain-specific effects of the biparser, i.e., those 
related to manipulating source strings. This considerably simplifies any proof. 
Furthermore, the definition of purify is a monadic profunctor homomorphism 
which provides a set of equations that can be used to expedite the reasoning. 


Definition 8. A monadic profunctor homomorphism between monadic profunc- 
tors P and Q is a polymorphic function proj :: P u v —> Q u v such that: 


proj (comapp f p) = comapg f (proj p) 
proj (p >>=p k) (proj p) >>=Q (Ax — proj (k x)) 
proj (returnp x) = returng x 


Proposition 4. The purify :: Bp u v —> u — Maybe v operation for 
biparsers (above) is a monadic profunctor homomorphism between Bp and the 
monadic profunctor PartialFun u v = u — Maybe v. 


Corollary 3. (of Theorem 1 with Corollary 2 and Proposition 4) The biparser 
string is backward round tripping. 


Proof First prove (in Appendix B [36]) the following properties of biparsers 
char, int, and replicatedBp :: Int — Bp u v > Bp [u] [v] (writing proj 
for purify): 


proj char n = Just n (4.1) 
proj int n = Just n (4.2 
proj (replicateBp (length xs) p) xs = mapM (proj p) xs (4.3) 
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From these and the homomorphism properties we can prove 
proj string = Just: 


proj string xs 
= proj (comap length int >>= An — replicateBp n char) xs 

Prop.4 = (comap length (proj int) >>= An — proj (replicateBp n char)) xs 
(4.2) = (comap length Just >>= An — proj (replicateBp n char)) xs 
Def.2 = proj (replicateBp (length xs) char) xs 

(4.3) = mapM (proj char) xs 

(4.1) = mapM Just xs 

{monad} = Just xs 


Combining proj string = Just with Corollary 2 (string is weak backward 
round tripping) enables Theorem 1, proving that string is backward round 


tripping. 


The other two core examples in this paper also permit a definition of purify. 
We capture the general pattern as follows: 


Definition 9. A purifiable monadic profunctor is a monadic profunctor P with 
a homomorphism proj from P to the monadic profunctor of partial functions 
- — Maybe -. We say that proj p is the pure projection of p. 


Definition 10. A pure projection proj p :: u — Maybe v is called the identity 
projection when proj p x = Just x for all x :: u. 


Here and in Sects.5 and 6, identity projections enable compositional round- 
tripping properties to be derived from more general non-compositional proper- 
ties, as seen above for backward round tripping of biparsers. 

We have neglected forward round tripping, which is not compositional, not 
even in a weakened form. However, we can generalise compositionality with con- 
ditions related to injectivity, enabling a generalisation of forward round tripping. 
We call the generalised meta-property quasicompositionality. 


4.2 Quasicompositionality for Monadic Profunctors 


An injective function f : A — B is a function for which there exists a left inverse 
f-!: B — A, i.e., where fT! o f = id. We can see this pair of functions as 
a simple kind of bidirectional program, with a forward round-tripping property 
(assuming f is the forwards direction). We can lift the notion of injectivity to 
the monadic profunctor setting and capture forward round-tripping properties 
that are preserved by the monadic profunctor operations, given some additional 
injectivity-like restriction. We first formalise the notion of an injective arrow. 

Informally, an injective arrow k :: v — m w produces an output from which 
the input can be recalculated: 
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Definition 11. Let m be a monad. A function k :: v > m w is an injective 


arrow if there exists k’ :: w — v (the left arrow inverse of k) such that for all 
Xo yi 
k x >>= Ay — return (x, y) = k x >>= Ay —> return (k’ y, y) 


Next, we define quasicompositionality which extends the compositionality 
meta-property with the requirement for >>= to be applied to injective arrows: 


Definition 12. Let P be a monadic profunctor. A property RY C P u v indexed 
by types u and v is quasicompositional if the following holds 


1. For all x :: v (return x) € Ry (qcomp-return) 
2. For allp :: P u v, k = v — P u w, if k is an injective arrow, 


(p € Ry) A (Wv. (k v) € Ry) => (p >>= k) € Ry (qcomp-bind) 


? 


3. For all p :: P u? v, f :: u — Maybe v’, 
pe RYA = (comap f p) € Ry (qcomp-comap) 


We now formulate a weakening of forward round tripping. As with weak back- 
ward round tripping, we rely on the idea that the printer outputs both a string 
and the value that was printed, so that we need to compare the outputs of both 
the parser and the printer, as opposed to comparing the output of the parser 
with the input of the printer as in (strong) forward round tripping. If running the 
parser component of a biparser on a string sQ1 yields a value y and a remaining 
string s1, and the printer outputs that same value y along with a string s@, then 
sQ is the prefix of s01 that was consumed by the parser, i.e., s91 = sð ++ s1. 


Definition 13. A biparser p : Bp u v is weak forward round tripping if for all 
X i: u, y 2: v, and sð, s1, sQ@1 :: String then: 


parse p sl = (y, s1) A print p x = Just (y, sð) = > sl = s0 ++ s1 
Proposition 5. Weak forward round tripping is quasicompositional. 


Proof. We sketch the qcomp-bind case, where p = (m >>= k) for some m and k 
that are weak forward roundtripping. From parse (m >>= k) s@1 = (y, s1), 
it follows that there 
exists z, s such that parse m s01 = (z, s) andparse (k z) s = (y, s1). Sim- 
ilarly print (m >>= k) x = Just (y, sQ@) implies there exists z’, sð’ such that 
print m x = Just (z’, s®’) and print (k z’) x = Just (y, s1’) and s@ = 
sð’ ++ s1’. Because k is an injective arrow, we have z = z’ (see appendix). 
We then use the assumption that m and k are weak forward roundtripping on 
m and on k a, and deduce that s01 = s@? ++ s and s = s1’ ++ s1 therefore 
sQ1 = sQ ++ s1. 


Proposition 6. The char biparser is weak forward round tripping. 


Composing Bidirectional Programs Monadically 165 


Corollary 4. Propositions 5 and 6 imply that string is weak forward round 
tripping if we restrict the parser to inputs whose digits do not contain redundant 
leading zeros. 


Proof. All of the right operands of >>= in the definition of string are injective 
arrows, apart from Ads — return (read ds) at the end of the auxiliary int 
biparser. Indeed, the read function is not injective since multiple strings may 
parse to the same integer: read "0" = read "00" = 0. But the pre-condition to the 
proposition (no redundant leading zero digits) restricts the input strings so that 
read is injective. The rest of the proof is a corollary of Propositions 5 and 6. 


Thus, quasicompositionality gives us scalable reasoning for weak forward 
round tripping, which is by construction for biparsers: we just need to prove this 
property for individual atomic biparsers. Similarly to backward round tripping, 
we can prove forward round tripping by combining weak forward round tripping 
with the identity projection property: 


Theorem 2. If p :: P u u is weak forward round-tripping, and for all x :: u, 
purify p x = Just x, then p is forward round tripping. 


Corollary 5. The biparser string is forward round tripping by the above theo- 
rem (with identity projection shown in the proof of Corollary 3) and Corollary 4. 


In summary, for any BX we can consider two round-tripping properties: forwards- 
then-backwards and backwards-then-forwards, called just forward and backward 
here respectively. Whilst combinator-based approaches can guarantee round- 
tripping by construction, we have made a trade-off to get greater expressivity in 
the monadic approach. However, we regain the ability to reason about bidirec- 
tional transformations in a manageable, scalable way if round-tripping properties 
are compositional. Unfortunately, due to the monadic profunctor structuring, 
this tends not to be the case. Instead, weakened round-tripping properties can 
be compositional or quasicompositional (adding injectivity). In such cases, we 
recover the stronger property by proving a simple property on aligned transfor- 
mations: that the backwards direction faithfully reproduces its input as its out- 
put (identity projection). Appendix C in our extended manuscript [36] compares 
this reasoning approach to a proof of backwards round tripping for separately 
implemented parsers and printers (not using our combined monadic approach). 


5 Monadic Bidirectional Programming for Lenses 


Lenses are a common object of study in bidirectional programming, comprising 
a pair of functions (get : S — V,put : V — S — S) satisfying well-behaved 
lens laws shown in Sect. 1. Previously, when considering the monadic structure 
of parsers and printers, the starting point was that parsers already have a well- 
known monadic structure. The challenge came in finding a reasonable monadic 
characterisation for printers that was compatible with the parser monad. In the 
end, this construction was expressed by a product of two monadic profunctors 
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Fwd mand Bwd n for monads mand n. For lenses we are in the same position: the 
forwards direction (get) is already a monad—the reader monad. The backwards 
direction put is not a monad since it is contravariant in its parameter; the same 
situation as printers. We can apply the same approach of “monadisation” used 
for parsers and printers, giving the following new data type for lenses: 


dataL suv=L{ get :: s — v, put :: u —> s — (v, s) } 


The result of put is paired with a covariant parameter v (the result type of get) 
in the same way as monadic printers. Instead of mapping a view and a source 
to a source, put now maps values of a different type u, which we call a pre-view, 
along with a source s into a pair of a view v and source s. This definition can be 
structured as a monadic profunctor via a pair of Fwd and Bwd constructions: 


type L s = (Fwd (Reader s)) :*: (Bwd (State s)) 


Thus by the results of Sect. 3, we now have a monadic profunctor characterisation 
of lenses that allows us to compose lenses via the monadic interface. 

Ideally, get and put should be total, but this is impossible without a way 
to restrict the domains. In particular, there is the known problem of “duplica- 
tion” [23], where source data may appear more than once in the view, and a 
necessary condition for put to be well-behaved is that the duplicates remain 
equal amid view updates. This problem is inherent to all bidirectional transfor- 
mations, and bidirectional languages have to rule out inconsistent updates of 
duplicates either statically [13] or dynamically [23]. To remedy this, we capture 
both partiality of get and a predicate on sources in put for additional dynamic 
checking. This is provided by the following Fwd and Bwd monadic profunctors: 


type ReaderT rma=r—-+ma 
type StateT sma = s — m (a, s) 
type WriterT wm a =m (a, w) 


type L s = (Fwd (ReaderT s Maybe)) 
:*: (Bwd (StateT s (WriterT (s — Bool) Maybe))) 


-- Smart deconstructors: 
get :: Ls u v —> (s — Maybe v) 
put : L s u v + (u +> s — Maybe ((v, s), s — Bool)) 


Going forwards, getting a view v from a source s may fail if there is no view for 
the current source. Going backwards, putting a pre-view u updates some source s 
(via the state transformer StateT s), but with some further structure returned, 
provided by WriterT (s — Bool) Maybe (similar to the writer transformer used 
for biparsers, Sect. 3.2). The Maybe here captures the possibility that put can 
fail. The WriterT (s — Bool) structure provides a predicate which detects the 
“duplication” issue mentioned earlier. Informally, the predicate can be used to 
check that previously modified locations in the source are not modified again. 
For example, if a lens has a source made up of a bit vector, and a put sets bit i 
to 1, then the returned predicate will return True for all bit vectors where bit 7 is 
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1, and False otherwise. This predicate can then be used to test whether further 
put operations on the source have modified bit 2. 

Similarly to biparsers, a pre-view u can be understood as containing the view 
v that is to be merged with the source, and which is returned with the updated 
source. Ultimately, we wish to form lenses of matching input and output types 
(i.e. L s v v) satisfying the standard lens well-behavedness laws, modulo explicit 
management of partiality via Maybe and testing for conflicts via the predicate: 


put 1 x s = Just ((_, s’), p) A p s’ => get l s’ = Just x (L-PutGet) 
get 1 s = Just x => put 1 xs = Just ((_, s), _) (L-GetPut) 


L-PutGet and L-GetPut are backward and forward round tripping respectively. 
Some lenses, such as the later example, are not defined for all views. In that case 
we may say that the lens is backward/forward round tripping in some subset 
P Cu when the above properties only hold when x is an element of P. 

For every source type s, the lens type L s is automatically a monadic profunc- 
tor by its definition as the pairing of Fwd and Bwd (Sect.3.1), and the following 
instance of MonadPartial for handling failure and instance of Monoid to satisfy 
the requirements of the writer monad: 


instance MonadPartial (StateT s (WriterT (s — Bool) Maybe)) where 
toFailure Nothing = StateT (A_ — WriterT Nothing) 
toFailure (Just x) = StateT (As — WriterT (Just ((x , s), mempty))) 


instance Monoid (s — Bool) where 
mempty = \_ — True 
mappend h j = As®@ — h s0 && j sO 


A simple lens example operates on key-value maps. For keys of type Key and 
values of type Value, we have the following source type and a simple lens: 


type Src = Map Key Value 


atKey :: Key — L Src Value Value -- Key-focussed lens 
atKey k = mkLens (lookup k) 
(Av — Amap > Just ((v, insert k v map), Am’ — lookup k m’ == Just v)) 


The get component of the atKey lens does a lookup of the key k in a map, 
producing Maybe of a Value. The put component inserts a value for key k. When 
the key already exists, put overwrites its associated value. 

Due to our approach, multiple calls to atKey can be composed monadically, 
giving a lens that gets/sets multiple key-value pairs at once. The list of keys and 
the list of values are passed separately, and are expected to be the same length. 


atKeys :: [Key] —> L Src [Value] [Value] 

atKeys [] = return [] 

atKeys (k : ks) = do 
x << comap headM (atKey k) -- headM :: [a] — Maybe a 
xs < comap tailM (atKeys ks) -- tailM :: [a] — Maybe [a] 
return (x : xs) 
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We refer interested readers to our implementation [12] for more examples, includ- 
ing further examples involving trees. 


Round tripping. We apply the reasoning framework of Sect. 4, taking the stan- 
dard lens laws as the starting point (neither of which are compositional). 

We first weaken backward round tripping to be compositional. Informally, 
the property expresses the idea, that if we put some value x in a source s, 
resulting in a source s’, then what we get from s’ is x. However two important 
changes are needed to adapt to our generalised type of lenses and to ensure 
compositionality. First, the value x that was put is now to be found in the output 
of put, whereas there is no way to constrain the input of put because its type 
v is abstract. Second, by sequentially composing lenses such as in 1 >>= k, the 
output source s’ of put 1 will be further modified by put (k x), so this round- 
tripping property must constrain all potential modifications of s’. In fact, the 
predicate p ensures exactly that the view get 1 has not changed and is still x. It 
is not even necessary to refer to s’, which is just one source for which we expect 
p to be True. 


Definition 14. A lens 1 :: L s u v is weak backward round tripping if for all 
x :: u, y x v, for all sources s, s’, and for all p :: s — Bool, we have: 


put 1 x s = Just ((y, _), p Apse => get 1s’ = Just y 
Theorem 3. Weak backward round tripping is a compositional property. 


Again, we complement this weakened version of round tripping with the 
notion of purification. 


Proposition 7. Our lens type L is a purifiable monadic profunctor (Defini- 
tion 9), with a family of pure projections proj s indexed by a source s, defined: 


proj :: s — L s uv —> (u — Maybe v) 


proj s = Al u — fmap (fst o fst) (put 1 u s) 


Theorem 4. If a lens 1 :: L s u u is weak backward round tripping and has 
identity projections on some subset P C u (ie., for all s, x then x € P > 
proj s 1 x = Just x) then 1 is also backward round tripping on all x € P. 


To demonstrate, we apply this result to atKeys :: [Key] — L Src [Value] [Value]. 
Proposition 8. The lens atKey k is weak backward round tripping. 
Proposition 9. The lens atKey k has identity projection: proj z (atKey k)=Just. 


Our lens atKeys ks is therefore weak backward round tripping by construc- 
tion. We now interpret/purify atKeys ks as a partial function, which is actually 
the identity function when restricted to lists of the same length as ks. 
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Proposition 10. For all vs :: [Value] such that length vs = length ks, and 
for all s :: Src then proj s (atKeys ks) vs = Just vs. 


Corollary 6. By the above results, atKeys ks :: L Src [Value] [Value] for 
all ks is backward round tripping on lists of length length ks. 


The other direction, forward round tripping, follows a similar story. We first 
restate it as a quasicompositional property. 


Definition 15. A lens 1 :: L s u v is weak forward round tripping if for all 
x :: u, y x v, for all sources s, s’, and for all p :: s — Bool, we have: 


get l s = Just y A put 1 x s = Just ((y, s’), ) = s=s’ 
Theorem 5. Weak forward round tripping is a quasicompositional property. 


Along with identity projection, this gives the original forward L-GetPut 
property. 


Theorem 6 Ifa lens 1 is weak forward round tripping and has identity projec- 
tions on some subset P (i.e., for all s, x then x € P = proj s 1 x = Just x) 
then 1 is also forward round tripping on P. 


We can thus apply this result to our example (details omitted). 


Proposition 11. For all ks, the lens atKeys ks :: L Src [Value] [Value] is 
forward round tripping on lists of length length ks. 


6 Monadic Bidirectional Programming for Generators 


Lastly, we capture the novel notion of bidirectional generators (bigenera- 
tors) extending random generators in property-based testing frameworks like 
QuickCheck [10] to a bidirectional setting. The forwards direction generates val- 
ues conforming to a specification; the backwards direction checks whether values 
conform to a predicate. We capture the two together via our monadic profunctor 
pair as: 

type G = (Fwd Gen) :*: (Bwd Maybe) 

-- ... with deconstructors and constructors 


generate :: Gu v — Gen v -- forward direction 
check = G u v — u — Maybe v -- backward direction 
mkG : Gen v + (u > Maybe v) > Guv 


The forwards direction of a bigenerator is a generator, while the backwards 
direction is a partial function u — Maybe v. A value G u v represents a subset 
of v, where generate is a generator of values in that subset and check maps 
pre-views u to members of the generated subset. In the backwards direction, 
check g defines a predicate on u, which is true if and only if check g uis Just of 
some value. The function toPredicate extracts this predicate from the backward 
direction: 
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toPredicate :: G u v — u — Bool 
toPredicate g x = case check g x of Just _ — True; Nothing — False 


The bigenerator type G is automatically a monadic profunctor due to our con- 
struction (Sect.3). Thus, monad and profunctor instances come for free, modulo 
(un)wrapping of constructors and given a trivial instance of MonadPartial: 


instance MonadPartial Maybe where toFailure = id 


Due to space limitations, we refer readers to Appendix E [36] for an example of 
a compositionally-defined bigenerator that produces binary search trees. 


Round tripping. A random generator can be interpreted as the set of values it 
may generate, while a predicate represents the set of values satisfying it. For a 
bigenerator g, we write x € generate g when x isa possible output of the genera- 
tor. The generator of a bigenerator g should match its predicate toPredicate g. 
This requirement equates to round-tripping properties: a bigenerator is sound if 
every value which it can generate satisfies the predicate (forward round tripping); 
a bigenerator is complete if every value which satisfies the predicate can be gen- 
erated (backward round tripping). Completeness is often more important than 
soundness in testing because unsound tests can be filtered out by the predicate, 
but completeness determines the potential adequacy of testing. 


Definition 16. A bigenerator g :: G u uis complete (backward round tripping) 
when toPredicate g x = True implies x € generate g. 


Definition 17. A bigenerator g :: G u u is sound (forward round tripping) if 
for all x :: u, x € generate g implies that toPredicate g x = True. 


Similarly to backward round tripping of biparsers and lenses, completeness can 
be split into a compositional weak completeness and a purifiable property. 

As before, the compositional weakening of completeness relates the forward 
and backward components by their outputs, which have the same type. 


Definition 18. A bigenerator g :: G u v is weak-complete when 
check g x = Just y = > y€ generate g. 
Theorem 7. Weak completeness is compositional. 


In a separate step, we connect the input of the backward direction, i.e., the 
checker, by reasoning directly about its pure projection (via a more general 
form of identity projection) which is defined to be the checker itself: 


Theorem 8. A bigenerator g :: G u uis complete if it is weak-complete and its 
checker satisfies a pure projection property: check g x = Just x’ => x = x’ 


Thus to prove completeness of a bigenerator g :: G u u, we first have weak- 
completeness by construction, and we can then show that check g is a restriction 
of the identity function, interpreting all bigenerators simply as partial functions. 
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Considering the other direction, soundness, there is unfortunately no decom- 
position into a quasicompositional property and a property on pure projections. 
To see why, let bool be a random uniform bigenerator of booleans, then con- 
sider for example, comap isTrue bool and comap isTrue (return True), where 
isTrue True = Just True and isTrue False = Nothing. Both satisfy any qua- 
sicompositional property satisfied by bool, and both have the same pure pro- 
jection isTrue, and yet the former is unsound—it can generate False, which is 
rejected by isTrue—while the latter is sound. This is not a problem in practice, 
as unsoundness, especially in small scale, is inconsequential in testing. But it 
does raise an intellectual challenge and an interesting point in the design space, 
where ease of reasoning has been traded for greater expressivity in the monadic 
approach. 


7 Discussion and Related Work 


Bidirectional transformations are a widely applicable technique used in many 
domains [11]. Among language-based solutions, the lens framework is most influ- 
ential [3,4,13,14,24,29]. Broadly speaking, combinators are used as program- 
ming constructs with which complex lenses are created by combining simpler 
ones. The combinators preserve round tripping, and therefore the resulting pro- 
grams are correct by construction. A problem with lens languages is that they 
tend to be disconnected from more general programming. Lenses can only be con- 
structed by very specialised combinators and are not subject to existing abstrac- 
tion mechanisms. Our approach allows bidirectional transformations to be built 
using standard components of functional programming, and gives a reasoning 
framework for studying compositionality of round-tripping properties. 

The framework of applicative lenses [18] uses a function representation of 
lenses to lift the point-free restriction of the combinator-based languages, and 
enables bidirectional programming with explicit recursion and pattern matching. 
Note that the use of “applicative” in applicative lenses refers to the transitional 
sense of programming with -abstractions and functional applications, which is 
not directly related to applicative functors. In a subsequent work, the authors 
developed a language known as HOBiT [20], which went further in featuring 
proper binding of variables. Despite the success in supporting A-abstractions and 
function applications in programming bidirectional transformations, none of the 
languages have explored advanced patterns such as monadic programming. 

The work on monadic lenses |1] investigates lenses with effects. For instance, 
a “put” could require additional input to resolve conflicts. Representing effects 
with monads helps reformulate the laws of round-tripping. In contrast, we made 
the type of lenses itself a monad, and showed how they can be composed monad- 
ically. Our method is applicable to monadic lenses, yielding what one might call 
monadic monadic lenses: monadically composable lenses with monadic effects. 
We conjecture that laws for monadic lenses can be adapted to this setting with 
similar compositionality properties, reusing our reasoning framework. 

Other work leverages profunctors for bidirectionality. Notably, a Profunc- 
tor optic [26] between a source type s and a view type v is a function of type 
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p v v > p s s, for an abstract profunctor p. Profunctor optics and our monadic 
profunctors offer orthogonal composition patterns: profunctor optics can be 
composed “vertically” using function composition, whereas monadic profunctor 
composition is “horizontal” providing sequential composition. In both cases, com- 
position in the other direction can only be obtained by breaking the abstraction. 

It is folklore in the Haskell community that profunctors can be combined 
with applicative functors [22]. The pattern is sometimes called a monoidal pro- 
functor. The codec library [8] mentioned in Sect. 3 prominently features two 
applications of this applicative programming style: binary serialisation (a form 
of parsing/printing) and conversion to and from JSON structures (analogous 
to lenses above). Opaleye [28], an EDSL of SQL queries for Postgres databases, 
uses an interface of monoidal profunctors to implement generic operations such as 
transformations between Haskell datatypes and database queries and responses. 

Our framework adapts gracefully to applicative programming, a restricted 
form of monadic programming. By separating the input type from the output 
type, we can reuse the existing interface of applicative functors without modifi- 
cation. Besides our generalisation to monads, purification and verifying round- 
tripping properties via (quasi)compositionality are novel in our framework. 

Rendel and Ostermann proposed an interface for programming parsers and 
printers together [30], but they were unable to reuse the existing structure of 
Functor, Applicative and Alternative classes (because of the need to han- 
dle types that are both covariant and contravariant), and had to reproduce the 
entire hierarchy separately. In contrast, our approach reuses the standard type 
class hierarchy, further extending the expressive power of bidirectional program- 
ming in Haskell. FliPpr [17,19] is an invertible language that generates a parser 
from a definition of a pretty printer. In this paper, our biparser definitions are 
more similar to those of parsers than printers. This makes sense as it has been 
established that many parsers are monadic. Similar to the case of HOBiT, there 
is no discussion of monadic programming in the FliPpr work. 

Previous approaches to unifying random generators and predicates mostly 
focused on deriving generators from predicates. One general technique evaluates 
predicates lazily to drive generation (random or enumerative) [7,9], but one loses 
control over the resulting distribution of generated values. Luck [15] is a domain- 
specific language blending narrowing and constraint solving to specify generators 
as predicates with user-provided annotations to control the probability distribu- 
tion. In contrast, our programs can be viewed as generators annotated with left 
inverses with which to derive predicates. This reversed perspective comes with 
trade-offs: high-level properties would be more naturally expressed in a declara- 
tive language of predicates, whereas it is a priori more convenient to implement 
complex generation strategies in a specialised framework for random generators. 


Conclusions. This paper advances the expressive power of bidirectional program- 
ming; we showed that the classic bidirectional patterns of parsers/printers and 
lenses can be restructured in terms of monadic profunctors to provide sequential 
composition, with associated reasoning techniques. This opens up a new area 
in the design of embedded domain-specific languages for BX programming, that 
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does not restrict programmers to stylised interfaces. Our example of bigenera- 
tors broadened the scope of BX programming from transformations (converting 
between two data representations) to non-transformational applications. 

To demonstrate the applicability of our approach to real code, we have devel- 
oped two bidirectional libraries [12], one extending the attoparsec monadic parser 
combinator library to biparsers and one extending QuickCheck to bigenerators. 
One area for further work is studying biparsers with lookahead. Currently looka- 
head can be expressed in our extended attoparsec, but understanding its inter- 
action with (quasi)compositional round-tripping is further work. 

However, this is not the final word on sequentially composable BX programs. 
In all three applications, round-tripping properties are similarly split into weak 
round tripping, which is weaker than the original property but compositional, 
and purifiable, which is equationally friendly. An open question is whether an 
underlying structure can be formalised, perhaps based on an adjunction model, 
that captures bidirectionality even more concretely than monadic profunctors. 
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Abstract. Site-graph rewriting languages, such as Kappa or BNGL, 
offer parsimonious ways to describe highly combinatorial systems of 
mechanistic interactions among proteins. These systems may be then 
simulated efficiently. Yet, the modeling mechanisms that involve counting 
(a number of phosphorylated sites for instance) require an exponential 
number of rules in Kappa. In BNGL, updating the set of the potential 
applications of rules in the current state of the system comes down to 
the sub-graph isomorphism problem (which is NP-complete). 

In this paper, we extend Kappa to deal both parsimoniously and effi- 
ciently with counters. We propose a single push-out semantics for Kappa 
with counters. We show how to compile Kappa with counters into Kappa 
without counters (without requiring an exponential number of rules). 
We design a static analysis, based on affine relationships, to identify the 
meaning of counters and bound their ranges accordingly. 


1 Introduction 


Site-graph rewriting is a paradigm for modeling mechanistic interactions among 
proteins. In Kappa [18] and BNGL [3,40], rewriting rules describe how instances 
of proteins may bind and unbind, and how each protein may activate the inter- 
action sites of each others, by changing their properties. Sophisticated signal- 
ing cascades may be described. The long term behavior of such models usu- 
ally emerges from competition against shared-resources, proteins with multiple- 
phosphorylation sites, scaffolds, separation of scales, and non-linear feedback 
loops. 

It is often desirable to add more structure to states in order to describe 
generic mechanisms more compactly. In this paper, we consider extending Kappa 
with counters with numerical values. As opposed to the properties of classical 
Kappa sites, which offer no structure, counters allow for expressive preconditions 
(such as the value of a counter is less than 2), but also for generic update 
functions (such as incrementing or decrementing the current value of a counter 
by a given value independently of its current value). Without counters, such 
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Fig. 1. Three representations for the phosphorylation of a site. We assume that the 
rate of phosphorylation of a site in a protein in which exactly k sites are already 
phosphorylated, is equal to the value f(k). The function f is left as a parameter of 
the model. In (a), we do not use counters. In order to get the number of sites that are 
already phosphorylated, we have to document the state of all the sites of the protein. 
In this rule, there are exactly 2 sites already phosphorylated, thus the rate of the 
rule is equal to f(2). In (b), we use a counter to encode the number of sites already 
phosphorylated. The variable k, that is introduced by the notation @k, contains the 
number of sites that are phosphorylated before the application of the rule. Thus, the 
rate of the rule is equal to f(k). In the right hand side, the notation +1 indicates that 
the counter is incremented at each application of the rule. The rule in (b) summarizes 
exactly 8 rules of the kind of the one in (a) (it defines the phosphorylation of the site 
a regardless of the states of the three other phosphorylation sites). In (c), we abstract 
away the sites and keep only the counter. The notation @k binds the variable k to the 
value of the counter. The left hand side also indicates that the rule may be applied 
only if the value of the counter is less than or equal to 3 (so that at least one site is not 
already phosphorylated). The right hand side specifies that the value of the counter 
is incremented at each application of the rule and that after the application of a rule, 
the value of the counter is always less than or equal to 4. The rule in (c) stands for 32 
rules of the kind of the one in (a) (it depends neither on which site is phosphorylated, 
nor on the state of the three other sites). 


update functions would require one rule per potential value of the counter. This 
raises efficiency issues for the simulation and also blurs any potential reasoning 
on the causality of the system. 

However adding counters cannot be done without consequences. The effi- 
ciency of Kappa simulations mainly relies on two ingredients. Firstly, Kappa 
graphs are rigid [16,39]: an embedding from a connected site-graph into a site- 
graph, when it exists, is fully determined by the image of one node. Thanks to 
rigidity, searching for the occurrences of a sub-graph into another graph (up-to 
isomorphism) may be done without backtracking (once a first node has been 
placed), and embeddings can be described in memory very concisely. Secondly, 
the representation of the set of potential applications of rules relies on a categori- 
cal construction [6] that optimizes sharing among patterns. Yet this construction 
cannot cope with the more expressive patterns that involve counters. In order 
to efficiently simulate models with counters, we need an efficient encoding that 
preserves rigidity and that use classical site-graph patterns. 

Let us consider a case study so as to illustrate the need for counters in Kappa. 
This example is inspired from the behavior of the protein KaiC that is involved in 
the synchronization of the proteins in the circadian clock. We consider one kind 
of protein with n identified sites that can get phosphorylated. Indeed, n is equal 
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to 6 in the protein KaiC’. We take n equal to 4 to make graphical representation 
lighter. We will make n diverge towards the infinity so as to empirically estimate 
the combinatorial complexity of several encoding schemes. 

The rate of phosphorylation/dephosphorylation of each site, depends on the 
number of sites that are already phosphorylated. In Fig. 1(a), we provide the 
example of a rule that phosphorylates the site a of the protein, assuming that 
the sites b and c are already phosphorylated and that the site d is not. Proteins 
are depicted as rectangles. Sites are depicted clockwise from the site a to the 
site d starting at the top left corner of the protein. Phosphorylation states are 
depicted with a black mark when the site is phosphorylated, and with a white 
mark otherwise. To fully encode this model in Kappa, we would require n - 2” 
rules. Indeed, we need to decide whether this is a phosphorylation or a dephos- 
phorylation (2 possibilities), then on which site to apply the transformation (n 
possibilities), then what the state of the other sites is (2”~! possibilities). This 
combinatorial complexity may be reduced by the means of counters. We con- 
sider a fresh site (this site is depicted on the right of the protein) and we assume 
that this site takes numerical values. Writing each rule carefully, we can enforce 
that the value of this site is always equal to the number of the sites that are 
phosphorylated in the protein instance. Thanks to this invariant, describing our 
model requires 2-n rules according to whether we describe a phosphorylation 
or a dephosphorylation (2 possibilities) and to which site the transformation is 
applied (n possibilities). An example of rule for the phosphorylation of the site a 
is given in Fig. 1(b). The notation @k assigns the value of the counter before the 
application of the rule to the variable k. Then the rate of the rule may depend 
on the value of k. This way, we can make the rate of phosphorylation depend on 
the number of sites already phosphorylated in the protein. Since there are only 
n sites that may be phosphorylated, it is straightforward to see that the counter 
may range only between the values 0 and n. 

If only the number of phosphorylated sites matters, we can go even further: 
we need just one counter and two rules, one for phosphorylating a new site 
(e. g. see Fig. 1(c)) and one for dephosphorylating it. The value of the counter 
is no longer related explicitly to a number of phosphorylated sites, thus we need 
another way to specify that the value of the counter is bounded. We do this, by 
specifying in the precondition of the rule that the phosphorylation rule may be 
applied only if the value of the counter is less or equal to n — 1, which entails 
that the value of the counter may range only between the values 0 and n. 

Not only parsimonious description of the mechanistic interactions in a model 
eases the process of writing a model, enhances readability and leads to more 
efficient simulation, but also it may provide better grain of observation of the 
system behavior. In Fig. 2, we illustrate this by looking at three causal traces 
that denote the same execution, but for three different encodings. Intuitively, 
causal traces [14,15] are inspired by event structures [43]. They describe sets 
of traces seen up to permutation of concurrent computation steps. The level of 
representation for the potential configurations of each protein impacts the way 
causality is defined, because what is tested in rules depends on the representation 
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level. In our case study, the phosphorylation of each site is intuitively causally 
independent: one site may be phosphorylated whatever the state of the other 
sites is. Without counters, the only way to specify that the rate of phosphoryla- 
tion depends on the number of the sites that are already phosphorylated, is to 
detail the state of every site of the protein in the precondition of the rule. This 
induces spurious causal relations (e. g. see Fig. 2(a)). Utilizing counters relaxes 
this constraint. However it is important to equip counters with arithmetic. With- 
out arithmetic, a rule may only set the value of a counter to a constant value. 
Thus for implementing counter increment, rules have to enumerate the potential 
values of the counter before their applications, and set the value of this counter 
accordingly. This induces again spurious causal relations (e. g. see Fig. 2(b)). 
With arithmetic, incrementing counters becomes a generic operation that may 
be applied independently of the current value of the counter. As a result the 
phosphorylation of the four sites can be seen as causally independent (e. g. see 
Fig. 2(c)). This faithfully represents the fact that the phosphorylation of the four 
sites may happen in arbitrary order. 


Contribution. Now we describe the main contributions of this paper. 

In Sect. 2, we formalize a single push-out (SPO) semantics for Kappa with 
counters. Having a categorical framework dealing with counters, as opposed to 
implementing counters as syntactic sugar, is important. Firstly, this semantics 
will serve as a reference for the formal specification of the behavior of coun- 
ters. Secondly, the categorical setting of Kappa provides efficient ways to define 
causality [14,15], symmetries [25], and some sound symbolic reasonings on the 
behavior of the number of occurrences of patterns [1,26] that are used in model 
reduction. Including counters in the categorical semantics of Kappa allows for 
extending the definition of these concepts to Kappa with counters for free. 

Yet different encodings of counters may be necessary to extend other tools for 
Kappa. In Sect. 3, we propose a couple of translations from Kappa with counters 
into Kappa without counters. The goal is to simulate models with counters effi- 
ciently without modifying the implementation of the Kappa simulator, KaSim 
[17]. The first encoding requires counters to be bounded from below and it sup- 
ports only two kinds of preconditions over counters: a rule may require the value 
of a counter to be equal to a given value, or to be greater than a given value. 
Requiring the value of a counter to be less than a given value is not supported. 
The second encoding supports equality and inequality (in both directions) tests. 
But it requires the value of each counter to be bounded also from above. 

Static analysis is needed not only to prove these requirements, but also to 
retrieve the meaning of counters. In Sect.4, we introduce a generic abstract 
interpretation framework [9] to infer the properties of reachable states of a model. 
This framework is parametric with respect to a class of properties. In Sect. 5, we 
instantiate this framework with a relational numerical analysis aiming at relating 
the value of each counter to its interpretation with respect to the state of the 
other sites. This is used to detect and prove bounds on the range of counters. 
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(a) Causal trace for the representation without counters. 
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(b) Causal trace for the representation with flat counters. 
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(c) Causal trace for the representation with arithmetical counters. 


Fig. 2. Three causal traces. Each causal trace is made of a set of partially ordered 
computation steps. Roughly speaking, a computation step precedes another one, if 
the former is necessary to perform the later. Each computation step is denoted as an 
arrow labeled with the rule that implements it. In (a), counters are not used. Every 
rule tests the full configuration of the protein. At this level of representation, the k- 
th phosphorylation causally precedes the k + 1-th one, whatever the order in which 
the sites have been phosphorylated. In (b), an additional site is used to record the 
number of phosphorylated sites in its internal state. With this encoding, the number 
of phosphorylated sites cannot be incremented without testing explicitly the internal 
state of the additional site. As a consequence, here again, at this level of representa- 
tion, each phosphorylation causally depends on the previous one. In (c), we use the 
expressiveness of arithmetic. We use generic rules to increment the counter regardless 
of its current value. Hence, at this level of representation, the phosphorylation of the 
four sites become independent, which flatten the causal trace. 


Related Works. Many modeling languages support arbitrary data-types. In 
Spatial-Kappa [41], counters encode the discrete position of agents. More gen- 
erally, in Chromar [29] and in colored Petri nets [30,35], agents may be tagged 
with values in arbitrary auxiliary programming languages. In ML-Rules [28], 
agents with attributes continuously diffuse within compartments and collide to 
interact. 

We have different motivations. Our goal is to enrich the state of proteins 
with some redundant information, so as to reduce the number of rules that are 
necessary to describe their mechanistic interactions. Also we want to avoid too 
expressive data-types, which could not be integrated within simulation, causal- 
ity analysis, and static analysis tools, without altering their performance. For 
instance, analysis of colored Petri nets usually relies on unfolding them into 
classical ones. Unfolding rule sets into classical ones does not scale because the 
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number of rules would become intractable. Thus we need tools which deal directly 
with counters. 

An encoding of two-counter machines has been proposed to show that most 
problems in Kappa are undecidable [19,34]. We represent counters the same way 
in our first encoding, but we provide atomic implementation for more primitives. 

The number of isomorphic classes of connected components that may occur in 
Kappa models during simulation is usually huge (if not infinite), which prevents 
from using agent-centric approaches [4]. For instance, one of the first non-toy 
model written in Kappa was involving more than 10'° kinds of bio-molecular 
complexes [16,26]. Kappa follows a rule-centric approach which allows for the 
description and the execution of models independently from the number of poten- 
tial complexes. Also, Kappa disallows to describe diffusion of molecules. Instead 
the state of the system is assumed to satisfy the well-mixed assumption. This 
provides efficient ways to represent and update the distribution of potential com- 
putation steps, along a simulation [6,17]. 

Equivalent sites [3] or hyperlinks [31] offer promising solutions to extend the 
decision procedures to extract minimal causal traces in the case of counters, but 
the rigidity of graphs is lost. Our encodings rely neither on the use of equivalent 
sites, nor on expanding the rules into more refined and more numerous ones. 
Hence our encodings preserve the efficiency of the simulation. 

Our analysis is based on the use of affine relationships [32]. It relates counter 
values to the state of the other variables. Such relationships look like the ones 
that help understanding and proving the correctness of semaphores [20,21]. We 
use the decision procedure that is described in [23,24] to deduce bounds on the 
values of counters from the affine relationships. The cost of each atomic com- 
putation is cubic with respect to the number of variables. Abstract multi-sets 
[27,38] may succeed in expressing the properties of interest, but they require a 
parameter setting a bound on the values that can abstract precisely. In practice, 
their time-cost is exponential as soon as this bound is not chosen big enough. 
Our abstraction has an infinite height. It uses widening [11] and reduction [12] 
to discover the bounds of interest automatically. Octagons [36,37] have a cubic 
complexity, but they cannot express the properties involving more than two vari- 
ables which are required in our context. Polyhedra [13] express all the properties 
needed for an exponential time-cost in practice. 


2 Kappa 


In this section, we enrich the syntax and the operational semantics of Kappa so 
as to cope with counters. We focus on the single push-out (SPO) semantics. 


2.1 Signature 
Firstly we define the signature of a model. 


Definition 1 (signature). The signature of a model is defined as a tuple X = 


(Xag, Xsites Vint, DE py DME 4, XS ny Propg, Updates) where: 
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Xag 18 a finite set of agent types, 

Xcite is a finite set of site identifiers, 

Sint is a finite set of internal state identifiers, 

DME go and DAE are three site maps (from Xag into p(X site)) 
Props is a potentially infinite set of non-empty subsets of Z, 

Updateg is a potentially infinite set of functions from Z to Z containing the 
identity function. 


Aa wwe 


For every G € Propg, we assume that for every function f E€ Updateg, the set 
{f(k) | k € G} belongs to the set Propg, and that for every element k € G, the 
set {k} belongs to the set Props as well. 


Agent types in Xag denote the agents of interest, the different kinds of pro- 
teins for instance. A site identifier in Xsite represents an identified locus for a 
capability of interaction. Each agent type A € Xag is associated with a set of sites 
dint (A) with an internal state (i.e. a property), a set of sites XHW (A) which 
may be linked, and a set of sites oR stl A) with a counter. We assume without 


any loss of generality that the three sets DME (A), DiR (A), and X’ et(A) are 
disjoint pairwise. The set Props contains the set of valid conditions that may be 
checked on the value of counters, whereas the set Updates contains all the pos- 
sible update functions for the value of counters. We assume that every singleton 
that is included in a valid condition is a valid condition as well. In this way, a 
valid condition may be refined to a fully specified value. Additionally, the image 
of a valid condition is required to be valid, so that the post-condition obtained 
by applying an update function to a valid precondition, is valid as well. 


Example 1 (running example). We define the signature for our case study as 


the tuple (Xag, eee Vint, VA oe DUE t ee Props, Updateg) where: 


ty Mag t= {P} 

2. Ssite = {a, b,c, d, x}; 

3. Mint = {o,e}; 

4. Sai = [P as {a, b, C, d}]; 

5. Sai = [P te? Ø]; 

6. Di = [P ee, {z}; 

7. Props is the set of all the convex parts of Z; 

8. Updates contains the function mapping each integer n E€ Z to its successor, 


and the function mapping each integer n € Z to its predecessor. 


The agent type P denotes the only kind of proteins. It has four sites a, b, c, d 
carrying an internal state and one site x carrying a counter. 


Until the rest of the paper, we assume given a signature X. 


2.2 Site-Graphs 


Site-graphs describe both patterns and chemical mixtures. Their nodes are typed 
agents with some sites which may carry internal and binding states, and counters. 
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(a) G1. (b) G2. (c) G3. (d) Gi, 
Fig. 3. Four site-graphs G1, G2, G3, and G4. 


Definition 2 (site-graph). A site-graph is a tuple G = (A, type, S, L, pk, ck) 
where: 


1. A is a finite set of agents, 
2. type : A — Xag is a function mapping each agent to its type, 
3. S is a set of sites satisfying the following property: 


SC {(n,i) |n € A,i € Dag-se(type(n))}, 
4. L maps the set: 


{(n,i) € S |i E Di .,(type(n))} 
to the set: 


{(n,i) € S | i € Xigst(tupe(n))} U (A, -}, 
such that: 
(a) for any site (n,i) E€ S, we have L(n,i) F (n,i); 
(b) for any two sites (n, i), (n', i) € S, we have (n',i') = L(n,i) if and only 
if (n,i) = L(n' i’); 
5. pk maps the set {(n,i) E€ S | i € Lint ltype(n))} to the set Vint; 


6. ck maps the set {(n,i) ES lie 8, -st(type(n)) } to the set Propg. 


For a site-graph G, we write as Ag its set of agents, typeg its typing function, 
Sg its set of sites, and Lg its set of links. Given a site-graph G, we write as S&* 
(resp. SZ, resp. SÈ) its set of binding sites (resp. property sites, resp. counters) 
that is to say the set of the sites (n, i) such that i € XF ,(typeg(n)) (resp. i € 
Diz e(typeg(n)), resp. i € X$g-seltupeg(n))). 

Let us consider a binding site (n, i) € SX”. Whenever Lg(n,i) =A, the site 
(n, i) is free. Various levels of information may be given about the sites that are 
bound. Whenever La(n,i) = —, the site (n,i) is bound to an unspecified site. 
Whenever La(n,i) = (n',i') (and hence Le(n’, i’) = (n,i)), the sites (n, i) and 
(n',i') are bound together. 

A chemical misture is a site-graph in which the state of each site is fully 
specified. Formally, a site-graph G is a chemical mixture, if and only if, the 
three following properties: 


1. the set Sg is equal to the set {(n, i) | n € Ag, i E€ Lag-se(typeg(n)) }; 
2. every binding site is free or bound to another binding site (i. e. for every 
(n, i) € Sa N Dink (typeg(n)), La(n, i) + =); 
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3. every counter has a single value (i. e. for every (n,i) € DF ae ckg(n,t) is a 
singleton); 


are satisfied. 


Example 2 (running example). In Fig. 3, we give a graphical representation of 
the four site-graphs, G1, Go, G3, and G4 that are defined as follows: 


1. (a) Aa, = {1}, 

(b) typeg, = [1+ P], 

(c) Sg, = {(1, a), (1, 2)}, 

(d) La, = 0, 

(e) pkg, =[(1,a) => o], 

(f) cra, = |(1,£) = {kE Z| k < 2}]; 
2. (a) Ac, = {1}, 

(b) typeg, = [1 > P], 

(c) Se, = {(1,2)}, 

(4) Le, =f, 

(e) pra, = |, 

(f) cra, = |(1, x)= {kE Z| k< 2}]; 
3. (a) Ae; = {1}, 

(b) typeg, = [1+ P], 

(c) SG = {(1, a), (1, x)}, 


ie aed, 
(f) CEG; = ((1, x) nd {k EZ | k < 3H; 
4. (a) Aa = {1}, 


(b) typeg, = [LH P], 

(c) Sa, = {(1, a), (1, b), (L, c), (1, d), (1, £)}, 

(d) La, = 0, 

(e) PRG, = [(1, a) ie o, (1, b) > e, (1,c) am e, (1,d) > o], 
(f) CKG, = [(1, x£) H= {2}]; 


The white site on the side of proteins is always the site x. The other sites, starting 
from the top-left one denote the sites a, b, c, and d clockwise. 


2.3 Sliding Embeddings 


In classical Kappa, two site-graphs may be related by structure-preserving injec- 
tions, which are called embeddings. Here, we extend their definition to cope with 
counters. There are two main issues: a rule may require the value of a given 
counter to belong to a non-singleton set; also updating counters may involve 
arithmetic computations. The smaller the set of the potential values for a counter 
is, the more information we have. Thus, embeddings may map the potential val- 
ues of a given counter into a subset. In order to cope with update functions, 
we equip embeddings with some arithmetic functions which explain how to get 
from the value of the counter in the source of the embedding to its value in the 
target. This way, our embeddings not only define instances of site-graphs, but 
they also contain the information to compute the values of counters. 
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mf ei mi 


(a) A sliding embedding. (b) A pure embedding. (c) A pure embedding. 


Fig. 4. Three sliding embeddings from the G2 respectively into the site-graphs G3, G1, 
and G4. Only the second and the third embeddings are pure. 


Definition 3 (sliding embedding). A sliding embedding h : G —s—>H 
from a site-graph G into a site-graph H is a pair (he, hg) where he is a function 
of agents he : Ag —> Ay and hg is a function mapping the counters of the 
site-graph G to update functions hg : S$, — Updateş such that for all agent 
identifiers m, n, n! E€ Ag and for all site identifiers i E Liag-si(typeg(n)), t € 
Sag-st(typeg(n’)), the following properties are satisfied: 


1. ifm#n, then he(m) Æ he(n); 

2. typeg(n) = typey(he(n)); 

3. if (n,t) E€ Sa, then (heln), i) € Sui 

4. if (n,i) € SE! and Le(n,i) = (n’', 7’), then La(he(n), i) = (he(n’), 7’); 
5. if (n,i) € E Sink and La(n,i) =A, en Lylheln), i) =; 

6. if (n,i) € sink and Le(n,t) = —, then Ly(heln), i) € {-}U SH; 

7. if (n,i) € Sint and prka(n,i) = t, then pk (he(n), i) = 0; 

8. if (n,i) E SQ, then cK (h(n), i) C {hg(k) | k € cxg(n, i)}. 


Two sliding embeddings between site-graphs, from E to F, and from F to 
G respectively, compose to form a sliding embedding from E to G (functions 
compose pair-wise). A sliding embedding (he, hg) such that hg maps each counter 
to the identity function is called a pure embedding. A pure embedding from E to 
F is denoted as E —> F. Pure embeddings compose. Two site-graphs E and F 
are isomorphic if and only if there exist a pure embedding from E to F and a pure 
embedding from F to E. A pure embedding between two isomorphic site-graphs 
is called an isomorphism. When it exists, the unique pure embedding (he, hg) 
from a site-graph E into the site-graph F such that Ag C Ap and heln) =n 
for every agent n € Ap, is called the inclusion from E to F and is denoted as 
ig, p or as E —C+F’ In such a case, we say that the site-graph E is included in 
the site-graph F. The inclusion from a site-graph into itself always exists and is 
called an identity embedding. 


Example 3 (running example). We show in Fig. 4 three sliding embeddings from 
the site-graph Go respectively into the site-graphs G3, G1, and G4. The first of 
these three sliding embeddings is assumed to increment the value of the counter 
of the site x. The last two embeddings are pure. 


Let L, R, and D be three site-graphs, such that R is included in D, and let 
@ be a sliding embedding from L into D. Then there exist a site graph D’ that 
is included in L and a sliding embedding 7 from D’ to R such that ir, py = 
ip, and such that D’ is maximal (w.r.t. inclusion among site-graphs) for this 
property. The pair (D’,ip’,r, 7) is called the pull-pack of the pair (¢,ir,p). 
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Let L, R, and D be three site-graphs such that D is included in L. A partial 
sliding embedding from L into R is defined as a pair made of the inclusion 7p 7, 
and a sliding embedding from D to R. Sliding embeddings may be considered as 
partial sliding embeddings with the inclusion as the identity embedding. Partial 
sliding embeddings compose by the means of a pull-back (e.g. see Fig. 5(b)). 


2.4 Rules 


Rules represent transformations between site-graphs. For the sake of simplicity, 
we only use a fragment of Kappa (we assume here that there are no side effects). 
Rules may break and create bonds between pairs of sites, change the properties 
of sites, update the value of counters. They may also create and remove agents. 
When an agent is created, all its sites must be fully specified: binding sites may 
be either free, or bound to a specific site, and the value of counters must be 
singletons. So as to ensure that there is no side-effect when an agent is removed, 
we also assume that the binding sites of removed agents are fully specified. These 
requirements are formalized as follows: 

Definition 4 (rule). A rule is a partial sliding embedding L<2— D coy 
such, that: 


R 


1. (modified agents) for all agents n € Ap such that he(n) € Ar and for every 
site identifier i © L'site( type, (n)), 
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(a) the site (n,i) belongs to the set Sz if and only if (he(n),2) belongs to set 


SR; 
(b) if the site (n,i) belongs to the set SEF, then either Lr(n,i) = — and 
Lr(he(n),i) = —, or Ly(n,i) € SPU {A} and Lra(he(n), i) € SHU {A}; 


(c) if the site (n,i) belongs to the set S$, then the sets ckr(he(n),i) and 
{hg(v) | v E€ ckr(n,i)} are equal. 
2. (removed agents) for all agents n E€ Ar such that n ¢ Ap, for every site 
identifier i € Zink (typer(n)), (n,i) € SH} and Ly(n,i) € SHF U {4}. 
3. (created agents) for all agents n € Ar for which there exists no n' € Ap such 
that n = he(n’), and for every site identifier i € Xsite(typeg(n)), 
(a) the site (n, i) belongs to the set Sp; 
(b) if the site (n,i) belongs to the set SB*, then the binding state Lr(n, i) 
belong to the set SBF U {4}; 
(c) if the site (n,i) belongs to the set S$, then ckR(n,1) is a singleton. 


In Definition 4, each agent that is modified occurs on both hand sides of a 
rule. Constraint la ensures that they document the same sites. Constraint 1b 
ensures that, if the binding state of a site is modified, then it has to be fully 
specified (either free, or bound to a specific site) in both hand sides of the rule. 
Constraint 1c ensures that the post-condition associated to a counter is the direct 
image of its precondition by its update function. Constraint 2 ensures that the 
agents that are removed have their binding sites fully specified. Constraint 3a 
ensures that, in the agents that are created, all the sites are documented. Beside, 
constraint 3b requires that the state of their binding site is either free or bound 
to a specific site. Constraint 3c ensures that their counters have a single value. 

An example of a rule is given in Fig. 6(a). 

A rule L<2— D —s—>R is usually denoted as L —> R (leaving the common 
region and the sliding embedding implicit). Rules are applied to site-graphs via 
pure embeddings using the single push-out construction [22]. 


Definition 5 (rule application [14]). Let r be a rule L —> R, L’ be a site- 
graph, and hy be a pure embedding from L to L'. Then, there exists a rule 
r! : L! —> R! and a pure embedding hr : R —+R' such that the following 
properties are satisfied (e. g. see Fig. 6(c)): 


1. har=r'hy; 

2. for all rules r” between the site-graph L’ and a site-graph R” and all embed- 
dings h'g from R into R” such that h'pr = rhy, there exists a unique pure 
embedding h from R' into R” such that r” = hr’ and hp = hhp. 


Moreover, whenever the site-graph L’ is a chemical mizture, the site-graph R' is 
a chemical mixture as well. 


We write L’ > R’ for a transition from the state L’ into the state R’ via an 
application of a rule r. Usually transition labels also mention the pure embedding 
(hy here), but we omit it since we do not use it in the rest of the paper. 
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Example 4 (running example). An example of rule application is depicted in 
Fig. 6. We consider the rule r that takes a protein with the site a unphospho- 
rylated and a counter with a value at least equal to 2, and that phosphorylates 
the site a while incrementing the counter by 1 (e. g. see Fig. 6(a)). Note that 
the update function of the counter is written next to its post-condition in the 
right hand side of the rule. We apply the rule to a protein with the sites b and 
c phosphorylated, the site d unphosphorylated, and the counter equal to 2 (e. g. 
see Fig. 6(b)). The result is a protein with the sites a, b, and c phosphorylated, 
the site d unphosphorylated and the counter equal to 3 (e. g. see Fig. 6(d)). 


A model M over a given signature X is defined as the pair (Go, R) where 
Go is a chemical mixture, representing the initial state, and R is a set of rules. 
Each rule is associated with a functional rate which maps each potential tuple 
of values for the counters of the left hand side of the rule to a non negative real 
number. We write C(M) for the set of states obtained from Go by applying a 
potentially empty sequence of rules in R. 


3 Encoding Counters 


In this section, we introduce two encodings from Kappa with counters into Kappa 
without counters. As explained in Sect. 1, our goal is to preserve the rigidity of 
site-graphs and to avoid the blow-up of the number of rules in the target model. 
This is mandatory to preserve the good performances of the Kappa simulator. 
Both encodings rely on syntactic restrictions over the preconditions and the 
update functions that may be applied to counters and on semantics ones about 
the potential range of counters. In Sects. 4 and 5, we provide a static analysis to 
check whether, or not, these semantics assumptions hold. 


3.1 Encoding the Value of Counters as Unbounded Chains of 
Agents 


In this encoding, each counter is bound to a chain of fictitious agents the length of 
which minus 1 denotes the value of the counter (another encoding not requiring 
the subtraction is possible but it would require side-effects). Encoding coun- 
ters as chains of agents has already been used in the implementation of two- 
counter machines in Kappa [19,34]. We slightly extend these works to implement 
more atomic operations over counters. We assume that the value of counters is 
bounded from below. For the sake of simplicity, we assume that counters range in 
N, but arbitrary lower bounds may be considered by shifting each value accord- 
ingly. We denote by N, the set of the site-graphs that have a counter with a 
negative value. They are considered as erroneous states, since they may not be 
encoded with chains of agents. 

Only two kinds of guards are handled. A rule may require the value of a 
counter to be equal to a given number or that the value of a counter is greater 
than a given number. Rules testing whether a value is less than a given number 
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Fig. 7. Encoding the value of counters as unbounded chains of agents. 


require unfolding each such rule into several ones (one per potential value). Also 
when the rate of a rule depends on the value of some counters, we unfold each 
rule according to the value of these counters, so that the rate of each unfolded 
rule is a constant (the Kappa simulator requires all the instances of a given rule 
in a given simulation state to have the same rate, for efficiency concerns). For 
update functions, we only consider constant functions and the functions that 
increase/decrease the value of counters by a fixed value. Testing whether the 
value of a counter is equal to (resp. greater than) n, can be done by requiring 
the corresponding chain to contain exactly (resp. at least) n+ 1 agents (e. g. see 
Figs. 7(b) and (c)). Incrementing (resp. decrementing) the value of a counter is 
modeled by inserting (resp. removing) agents at the beginning its chain (e. g. see 
Fig. 7(d), resp. Fig. 7(e)). Setting a counter to a fixed value, requires to detach 
its full chain in order to create a new one of the appropriate length (e. g. see 
Fig. 7(f)). In such a case, the former chain remains as a junk. Thus the state 
of the model must be understood up to insertion of junk agents. We introduce 
the function gc, that removes every chain of spurious agents not bound to any 
counter. We denote as [G]? (resp. [r]{) the encoding of a site-graph G (resp. of 
a rule r). 


3.2 Encoding the Value of Counters as Circular Lists of Agents 


In this second encoding, each counter is bound to a ring of agents. Each such 
agent has three binding sites zero, pred, and next, and a property site value 
which may be activated, or not. In a ring, agents are connected circularly through 
their site pred and nezt. Exactly one agent per ring is bound to a counter and 
exactly one agent per ring has the site value activated. The value of the counter 
is encoded by the distance between the agent bound to the counter and the agent 
that is activated, scanning the agents by following the direction given by the site 
next of each agent (clock-wisely in the graphical representation). We have to 
consider that counter values are bounded from above and below. Without any 
loss of generality, we assume that the length of each ring is the same, that is to 
say that counters range from 0 to n— 1, for a given n € N. We denote by 22 the 
set of the site-graphs with at least one counter not satisfying these bounds. 
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Fig. 8. Encoding the value of counters as circular lists of agents. 


Compared to the first encoding, this one may additionally cope with testing 
that a counter has a value less than a given constant without having to unfold the 
rule. Both encodings may deal with the same update functions. Testing whether 
a counter is equal to a value is done by requiring that the activated agent is at 
the appropriate distance of the agent that is connected to the counter (e. g. see 
Fig. 8(b)). It is worth noting that the intermediary agents are required to be 
not activated. This is not mandatory for the soundness of the encoding, this is 
an optimization that helps the simulator for detecting early that no embedding 
may associate a given agent of the left hand side of a rule to a given agent in the 
current state of the system. Inequalities are handled by checking that enough 
agents starting from the one that is connected to the counter and in the direction 
specified by the direction of the inequality, are not activated (e. g. see Fig. 8(c)). 
Incrementing/decrementing the value of a counter is modeled by making counter 
glide along the ring (e. g. see Figs. 8(d) and (e)). Special care has to be taken 
to ensure that the activated agent never crosses the agent linked to the counter 
(which would cause a numerical wrap-around). Assigning a given value to a 
counter requires to entirely remove the ring and to replace it with a fresh one 
(e. g. see Fig. 8(f)). It may be efficiently implemented without memory allocation. 
As in the first encoding, when the rate of a rule depends on the value of some 
counters, we unfold each rule according to the value of these counters, so that 
the rate of each unfolded rule is constant. 

We introduce the function gc, as the identity function over site-graphs (there 
are no junk agent in this encoding). We denote as [G]§ (resp. [r]5) the encoding 
without counter of a site-graph G (resp. of a rule r). 
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3.3 Correspondence 


The following theorem states that, whenever there is no numerical overflow and 
providing that junk agents are neglected, the semantics of Kappa with counters 
and the semantics of their encodings are in bisimulation. 


Theorem 1 (correspondence). Let i be either 1 or 2. Let G be a fully spec- 
ified site-graph such that G Z N; and r be a rule. Both following properties are 
satisfied: 


1. whenever there exists a site-graph G” such that G > G' and G! € Ni, there 
exists a site-graph Gy, such that [G]? Mi, G, and [G']} = gc (G9); 

2. whenever there exists a site-graph Gy such that [G]? T, Gg, there exists a 
site-graph G’ such that G > G’, G! ¢ ;, and [G’]? = gc;(Gs)- 


3.4 Benchmarks 


The experimental evaluation of the impact of both encodings to the performance 
of the simulator KaSim [6,17] is presented in Fig. 9. We focus on the example that 
has been presented in Sect.1. We plot the number of events that are simulated 
per second of CPU. For the sake of comparison, we also provide the simulation 
efficiency of the simulator NFSim [40] on the models written in BNGL with 
equivalent sites (with a linear number of rules only). 

We notice that, with KaSim, the direct approach (without counter) is the 
most efficient when there are less than 9 phosphorylation sites. We explain this 
overhead, by the fact that each encoding utilizes spurious agents that have to 
be allocated in memory and relies on rules with bigger left hand sides. Never- 
theless this overhead is reasonable if we consider the gain in conciseness in the 
description of the models. The versions of models with counters rely on a linear 
number of rules, which make models easier to read, document, and update. For 
more phosphorylation sites, simulation time for models written without counters 
blow up very quickly, due to the large number of rules. The simulation of the 
models with counters scales much better for both encodings. 

Models can be concisely described in BNGL without using counters, by the 
means of equivalent sites. Each version of the model uses n indistinguishable sites 
and only a linear number of rules is required. However, detecting the potential 
applications of rules in the case of equivalent sites relies on the sub-graph iso- 
morphism problem on general graphs, which prevent the approach to scale to 
large value of n. We observe that the efficiency of NFSim on this family of exam- 
ples is not as good as the ones of KaSim (whatever which of the three modeling 
methods is used). We also observe a very quick deterioration of the performances 
starting at n equal to 5. 
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Fig. 9. Efficiency of the simulation for the example in Sect. 1 with n ranging between 
1 and 14. We test the simulator KaSim with a version of the models written without 
counters and versions of the models according to both encodings (including the n 
phosphorylation sites). For the sake of comparison, we also compare with the efficiency 
of the simulator NFSim with the same model but written in BNGL by the means of 
equivalent sites. For each version of the model and each simulation method, we run 
15 simulations of 10° events on an initial state made of 100 agents and we plot the 
number of computation steps computed in average per second of CPU on a log scale. 
Every simulation has been performed on 4 processors: Intel(R) Xeon(R) CPU E5-2609 0 
@ 2.40 GHz 126 GB of RAM, running ubuntu 18.04. 


4 Generic Abstraction of Reachable States 


So far, we have provided two encodings to compile Kappa with counters into 
Kappa without counters. These encodings are sound under some assumptions 
over the range of counters. Now we propose a static analysis not only to check 
that these conditions are satisfied, but also to infer the meaning of the counters 
(in our case study, that they are equal to the number of phosphorylated sites). 

Firstly, we provide a generic abstraction to capture the properties of the 
states that a Kappa model may potentially take. Our abstraction is parametric 
with respect to the class of properties. It will be instantiated in Sect. 5. Our 
analysis is not complete: not all the properties of the program are discovered; 
nevertheless, the result is sound: all the properties that are captured, are correct. 


4.1 Collecting Semantics 


Let Q be the set of all the site-graphs. We are interested in the set C(M) of all 
the states that a model M = (Go, R) may take in 0, 1, or more computation 
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steps. This is the collecting semantics [7]. By [33], it may be expressed as the 
least fixpoint of the U-complete endomorphism F on the complete lattice p(Q) 
that is defined as F(X) = {Go} U {q | Jq € X,r € R such that q > q'}. By 
[42], the collecting semantics is also equal to the meet of all the post-fixpoints 
of the function F (i. e. C(M) = (\{X € p(Q) | F(X) C X}), that is to say the 
strongest inductive invariant of our model that is satisfied by the initial state. 


4.2 Generic Abstraction 


The collecting semantics is usually not decidable. We use the Abstract Interpre- 
tation framework [9,10] to compute a sound approximation of it. 


Definition 6 (abstraction). A tuple A = (Q'#,C,y,U,1,Z*,t#,V) is called 
an abstraction when all following conditions are satisfied: 


1. the pair (Q',C) is a pre-order of abstract properties; 
2. the component y : QË — p(Q) is a monotonic map (i.e. for every two 
abstract elements di, gi € QË such that qË = qi. we have (qh) Cc (qh)); 
3. the component U maps each finite set of abstract properties XË € finite(Q") to 
an abstract property U(X*) € Q? such that for each abstract property qË € X®, 
we have: q? E U(X#); 
the component L € QË is an abstract property such that y(L) = 9; 
the component T} is an element of the set QË? such that {Go} C 7(Z*); 
the component t? is a function mapping each pair (q,r) E€ Q°xR to an abstract 
property t*(q,r) € QË such that: Vq' € QË, Yq € y(q*), Yr E R, Yq' € Q, we 
have q' € y(t#(q*)) whenever q > q'; 
7. the component V : QË x Qt — QË satisfies both following properties: 
(a) Va}, a3 € Q, qi E Va} and a E 4V o, 
(b) WÈ )nen € (2#), the sequence (qX )nen that is defined as qf = di and 
= ay Vaio, for every integer n € N, is ultimately stationary. 


Soy 


The set QË is an abstract domain. It captures the properties of interest, and 
abstracts away the others. Each property qt € QË is mapped to the set of the 
concrete states y(q*) which satisfy this property by the means of the concretiza- 
tion function y. The pre-order C describes the amount of information which is 
known about the properties that we approximate. We use a pre-order to allow 
some concrete properties to be described by several unrelated abstract elements. 
The abstract union U is used to gather the information described by a finite num- 
ber of abstract elements. It may not necessarily compute the least upper bound 
of a finite set of abstract elements (this least bound may not even exist). The 
abstract element | provides the basis for abstract iterations. The concretization 
function is strict which means that it maps the element L to the empty set. 
The abstract property Z} is satisfied by the initial state. The function t? is used 
to mimic concrete rewriting steps in the abstract. The operator V is called a 
widening. It ensures the convergence of the analysis in finitely many iterations. 
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Given an abstraction (Q#, C, y, U, L, Z#,¢#, V), the abstract counterpart FË 
to F is defined as F#(g*) = LË ({q", Z*} U {t"(q*,r) | r € R}). The function F’ 
satisfies the soundness condition Vq* € Q*, [F o 7](q*) C [y o F*](q*). Following 
[7], we compute a sound and decidable approximation of our abstract semantics 
by using the widening operator V. The abstract iteration [10,11] of F? is defined 
by the following induction: FY = L and, for each integer n € N, FY,, = FY 
whenever F#(FY) C FY, and FY,, = FY VF! (FY) otherwise. 


n? 


Theorem 2 (Termination and soundness). The abstract iteration is ulti- 
mately stationary and its limit FY satisfies C(M) C 7(FY). 


V) 


Proof. By construction, FË( C FY. Since y is monotonic, it follows that: 
y(FH(FY)) C y(FY). Since, F o y C yoF!#, F(7(FY)) C 7(FY). So 7(FY) is a 
post-fixpoint of F. By [42], we have lfp F C 7(FY). 


4.3 Coalescent Product 


Two abstractions may be combined pair-wise to form a new one. The result is a 
coalescent product that defines a mutual induction over both abstractions. 


Definition 7 (coalescent product). The coalescent product between two 
abstractions (QË, E1, M, Ih, | Zt V1) and (QÉ, E2, Y2, Le, La, ZË, th, Vo) 18 
defined as the tuple (Q',C,y,U,1,Z*,t#, V) where 


1. Q = Q} x of; 
2. C, U, L, and V are defined pair-wise; 
met of i i : _ 
3. y maps every pair (q1, q3) to the meet yı (q1) O y2(45) of their respective con 
cretization; 
4. Ti = (TÅ, T8); 
5. tË maps every pair (qi, aÈ), r) € QË xR made of a pair of abstract properties 
and a rule to the abstract property Gi (qi, r), th (gl, r)) whenever i (qi, r) Æ Li 


and Elat, r) # Lə, and to the pair (L1, L2) otherwise. 


Theorem 3 (Soundness of the coalescent product). The coalescent prod- 
uct of two abstractions is an abstraction as well. 


We notice that if either of both abstractions proves that the precondition 
of a rule is not satisfiable, then this rule is discarded in the other abstraction 
(hence the term coalescent). By mutual induction, the composite abstraction 
may detect which rules may be safely discarded along the iterations of the 
analysis. 

We may now define an analysis modularly with respect to the class of con- 
sidered properties. We use the coalescent product to extend the existing static 
analyzer KaSa [5] with a new abstraction dedicated to the range of counters. 
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5 Numerical Abstraction 


Now we specialize our generic abstraction to detect and prove safe bounds to 
the range of counters. In general, this requires to relate the value of the counters 
to the state of others sites. Our approach consists in translating each protein 
configuration into a vector of relative numbers and in abstracting each rule by 
its potential effect on these vectors. We obtain an integer linear programming 
problem that we will solve by choosing an appropriate abstract domain. 

The set of convex parts of Z is written as Zz. We assume that guards on 
counters are element of Zz and that each update function either set counters to 
a constant value, or increment/decrement counters by a constant value. 


5.1 Encoding States and Preconditions 


We propose to translate each agent into a set of numerical constraints. A protein 
of type A is associated with one variable yà for each binding site i and each 
binding state A, one variable x: for each property site 7 and each internal state 
identifier 1, and one variable val; for each counter in i. 


Definition 8 (numerical variables). Let A € Xag be an agent type. We define 
the set Var, as the set of variables Var'* U Vari" U Var where: 

1. Varg” = {x} |i € Yigsl A) A E {APU {(AG2) | A’ © Lag, i © DIA) HS 
2. Vari = {yt |ie Dit al A), lE Lint}; 

3. Vari, = {val; | i € DS cay 


Intuitively, variables of the form yà (resp. xt) take the value 1 if the binding 
(resp. internal) state of the site i is À (resp. 4), whereas the variables of the form 
val; takes the value of the counter i. 

Each agent of type A may be translated into a function mapping each variable 
in the set Var, into a subset of the set Z. Such a function is called a guard. 


Definition 9 (Encoding of agents). Let G be a site-graph and n be an agent 
in Ag. We denote by A the type typeg(n). We define as follows the function 
guardg(n) from the set Var, into the set Tz: 


1. guardg(n)(x;') is equal to the singleton {1} whenever (n,i) € S@*(A) and 
Lea(n,i) =A, to the singleton {0} whenever (n,i) € SH#*(A) and La(n, i) #7, 
and to the set {0,1} whenever (n,i) ¢ S#*(A); 

2. guardg(n) 4) is equal to the singleton {1} whenever (n,i) € SEF(A) and 
there exists n' E€ Ag such that both conditions typeg(n’) = A’ and La(n, i) = 
(n',i') are satisfied, to the singleton {0} whenever (n,i) € SZ*(A) and either 
Laln,i) =A, or there exist an agent identifier n” E€ Ag and a site name 
i” € Leite such that (typec(n”), i”) F(A’, 1’), and to the set {0,1} whenever 
(n, i) g SP*(A) or Le(n,i) = -; 

3. guardg(n)(x{) is equal to the singleton {1} whenever (n,i) € SE (A) and 
praln,i) =; to the singleton {0} whenever (n,i) € SZ*(A) and pra(n,i) # 
1; and to set {0,1} whenever (n,i) ¢ SEŻ(A). 
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4. guardg(n)(val;) is equal to the set cxa(c) whenever (n,i) € S, and to the set 
Z otherwise. 


The variable x; takes the value {1} if we know that the site i is free, the 
value {0} if we know that it is bound, and the value {0,1} if we do not know 
whether the site is free or not. This is the same for binding type, the variable 
yA takes the value {1} if we know that the site is bound to the site i’ of 
an agent of type A’, the value {0} if we know that this is not the case, and the 
value {0,1} otherwise. Property sites work the same way. Lastly, the variable 
val; takes as value the set attached to the counter or the value Z if the site is not 
mentioned in the agent. We notice that when n is a fully-specified agent of type 
A, the function guardg(n) maps every variable in the set Var, to a singleton. 


Example 5 (running example). We provide the translation of the unique agent 
of the site-graph G; (e. g. see Fig. 3(a)) and the one of the unique agent of the 
site-graph G4, (e. g. see Fig. 3(d)). 

The agent of the site-graph G1 is translated as follows: 


x, = {1}; xe = {0}; 

xe = {0,1 x9 = {0,1}; 
xe = {0,1}; x2 = {0,1}; 
xq = {0,1}; x§ = {0,1}; 
val, = {z € Z| z< 2} 


According to the first two constraints, the site a is unphosphorylated. According 
to the next six ones, the sites b, c, and d have an unspecified state. According 
to the last constraint, the value of the counter must be less than or equal to 2. 

The translation of the agent of the site-graph G4 is obtained the same way: 


Xa = {1}; x} = {0}; 
xX = {0}; x8 = {1} 
xe = {0h xe = {1}; 
x3 = {1}; x3 = {0}; 
vals = {2} 


This means that the sites b and c are phosphorylated while the sites a and d 
are not. According to the last constraint, the value of the counter is equal to 2. 


5.2 Encoding Rules 


In Kappa, a rule may be applied only when its precondition is satisfied. Moreover, 
the application of a rule modifies the state of some sites in agents. We translate 
each rule into a tuple of guards that encodes its precondition, a set of non- 
invertible assignments (when a site is given a new state that does not depend 
on the former one), and a set of invertible assignments (when the new state 
of a site depends on the previous one). Such a distinction is important as we 
want to establish relationships among the value of some variables [32]: a non- 
invertible assignment completely hides the former value of a variable. This is not 
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the case with invertible assignments for which relationships may be propagated 
more easily. The agents that are created (which have no precondition) and the 
ones that are removed (which disappear), have a special treatment. 

oye . (he ,hg) $ 
Definition 10 (Encoding of rules). Each ruler : L<2> D C$—>R is 
associated with the tuple (pre,„, not-invert,., invert,, new,) where: 


1. pre, maps every agent n E€ Az in the left hand side of the rule r to its guard 
guard, (n); 

2. not-invert, maps every agent n E€ Ap and every variable v € VaT ypepln) 
such that the set guardg(he(n))(v) is a singleton and guardp(h-(n))(v) # 
guard,(n)(v) to the unique element of the set guardg(he(n))(v). 

3. invert, maps every agent n € Ap and every variable v € VOteype n(n) such 
that the set guardp(he(n))(v) is not a singleton and hg(n,i) is a function of 
the form |z € Zt z +c] with c € Z, to the relative number c. 

4. new, maps every agent n!’ E€ Ar such that there is no agent n E€ Ap satisfying 
he(n) = n’ to the guard guardp(n’). 


Example 6 (running example). The encoding of the rule of Fig. 6(a) is given as 
follows: 


— the function pre, maps the agent 1 to the following set of constraints: 


Xa = {1} xg = {0}; 

xs = {0,1}; x = {0,1}; 

xe = {0,1}; x? = {0,1}; ? ; 
xa = {0,1}; x3 = {0, 1}; 
val, ={zEZ|z<2} 


— the function not-invert, maps the pair (1,x°) to the value 0, and the pair 
(1, x%) to the value 1; 

— the function invert, maps the pair (1, x) to the successor function; 

— the function new, is the function with the empty domain. 


The guard specifies that the site a must be unphosphorylated and the value 
of the counter less or equal to 2. Applying the rule modifies the value of three 
variables. The site a gets phosphorylated. This is a non-invertible modification 
that sets the variable x? to the constant value 0 and the variable xê to the 
constant value 1. The counter x is incremented. This is an invertible modification 
that is encoded by incrementing the value of the variable valy. 


5.3 Generic Numerical Abstract Domain 


We are now ready to define a generic numerical abstraction. 


Definition 11 (Numerical domain). A numerical abstract domain is a fam- 
ily (AN) acs,, of tuples (DY, EY, ya, UY, LY, TY, gN, forgetN , oN VN) that 
satisfy the following conditions, for every agent type A € Xag: 
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1. the pair (DN, EN) is a pre-order; 


2. the component A i DN — p(ZY%a) is a monotonic function; 

3. the component LA ; P finite DY) => DY is an operator such that VX* € 
Pfinite( DY), vo € Xt, p L U(X*); 

4. the component IN is an element in the set DY such that a (LY 

5. the component TX is an element in the set DY such that y% (1 PM) = 

6. the component N is a function mapping each pair (g, p*) where g is a giard 
and p* an abstract property in DY to an abstract element in DY such that 
the set YN (gN (g, p*)) contains at least each function p € YẸ (p*) that verifies 


the condition p(v) € p*(v) for every variable v € Varz; 


0; 
ZVA, 


7. the component forget’ maps each pair (V, p*) € e(Var,) x DY to an abstract 
property forget’ (V, pt) € DN, the concretization (forget (V, p)) of which 
contains at least each function p € ZY such that there exists a function 
p! € YX (pt) satisfying plv) = p'(v) for each variable v € Var, \ V; 

8. the component &Y maps each pair (t, p*) € ZY" x DY to an abstract property 
ON (t, pt) € DN’, such for each function p € 7¥ (p*), the function mapping each 
variable v € Var, to the value p(v) + t(v) belongs to the set yY (8X (t, p*)); 

9. the will ail VN r a — ee ooo both following properties: 
(a) YÉ, ph EDX, pi CY pV oh and ph CY piv oh, 


(b) V(ph)nen E€ (DY), the sequence (pX P that is defined as py = È and 


pa =p YVN ph cs for every integer n € N, is ultimately stationary. 


5.4 Numerical Abstraction 


The following theorem explains how to build an abstraction (as defined in Sect. 4) 
from a numerical abstract domain. We introduce an operator Î to extend the 
domain of functions with default values. Given a function f, a value v and a 
super-set X of the domain of f, we write fẹ% f the extension of the function f 
that maps each element x € X \ Dom (f) to the value v. We also write set, for 
the function mapping pairs (f, X*) where f is a partial function from the set 
Var, into the set of the convex parts of Z and X # an abstract property in DN ; 
to the abstract property: A (Marg f, forget’ (dom( f), X*)). The function set, 
forgets all the information about the variables in the domain of the function f, 
and reassign their range to their image by f in the abstract. 


Theorem 4. Let (DY JEN Hates EN TN ea forgets. 54, VY) ex, be a 
numerical abstract domun, The tuple (Q',C,y,U,1,Z*,t#, V) that is defined 


by: 


1. the component QË is the set of the functions mapping each agent type A € dag 
to an abstract property in the set DN; 

2. the component y is the function mapping a function XË? € QË, to the set of 
the fully specified site-graph G such that for each agent n E€ Ag, we have 
guardg(n) € Yeypers(ny(X*(typea(n))); 

3. the components C, U, L are defined component-wise; 
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4. the component TË maps each agent type A € Sag to the abstract property 
UY {  (guardg, (n), TA |n E€ Aco}; 

5. the component tË is a function mapping each pair (X#*, r) € QË x R (we write 
r : L<2— D —s—+R) to the element Jey whenever there exists an agent 
n in Ar such that gY (pre,.(n), X? (typer (n))) = LN, and, otherwise, to the 
function mapping each agent type A to the numerical property: 


LYY ({X*(.A)} U fresh(r, A) U updated(r, A, X*)), 


with: 
- fresh(r, A) the set of the numerical abstract elements gY (newn, TX) for 
every n E€ dom(new,) such that typeg(n) = A; 
- and updated(r, A, X*) the set of the elements: 
set 4(not-invert,(n), 8X (19, invert,(n), gY (pre,.(n), X#(A)))) 
for each agent n € Ap with typep(n) = A; 


is a generic abstraction. 


Most of the constructions of the abstraction are standard. The expression 
gN (pre,.(n), X*#(type,,(n))) refines the abstract information about the potential 
configurations of the n-th agent in the left hand side of the rule, by taking into 
account its precondition. Whenever a bottom element is obtained for at least 
one agent, the precondition of the rule is not satisfiable and the rule is dis- 
carded at this moment of the iteration. Otherwise, the information about each 
agent is updated. Starting from the result of the refinement of the abstract ele- 
ment by the precondition, the function oy applies the invertible transformations 
7% invert,(n) (the function 7% extends the domain of the function invert,(n) 
by specifying that the variables not in the domain of this function remain 
unchanged), and the function set, applies non invertible one not-invert,.(n). 

The domain of intervals [8] and the one of affine relationships [32] provide 
all the primitives requested by Definition 11. We use a product of them, when 
all primitives are defined pair-wise, except the guards which refine its output by 
using the algorithm that is described in [23]. We use widening with thresholds [2] 
for intervals so as to avoid infinite bounds when possible. This way we obtain a 
domain, where all operations are cubic with respect to the number of variables. 

This is a very good trade-off. A relational domain is required. Other relational 
domain are either too imprecise [37], or to costly [13], or both [27,38]. 


5.5 Benchmarks 


We run our analysis on the family of models of Sect. 1 for n ranging between 1 
and 25. For each version of the model, the protein is made of n phosphorylation 
sites and a counter. Moreover, our analysis always discover that the counter 
ranges between 0 and n. CPU time is plot in Fig. 10. 
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Static analysis efficiency 


CPU time (in seconds) 
Oo 
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0 5 10 15 20 25 
nb phos sites 


Fig. 10. Efficiency of the static analysis for the example in Sect.1 with n ranging 
between 1 and 25. Every analysis has successfully computed the exact range of the 
counter. The analysis has been performed on a MacBook Pro on a 2.8 GHz intel Core 
i7, 16 GB of RAM, running under macOS High Sierra version 10.13.6. 


6 Conclusion 


When potential protein transformations depend on the number of sites satisfy- 
ing a given property, counters offer a convenient way to describe generic mecha- 
nisms while avoiding the explosion in the number of rules. We have extended the 
semantics of Kappa to deal with counters. We have proposed some encodings to 
remove counters while preserving the performance of the Kappa simulator. In 
particular, graphs remain rigid and the number of rules remain the same. Then, 
we have introduced a static analysis to bound the range of counters. 

It is quite common to find proteins with more than 40 phosphorylation sites. 
Without our contributions, the modeler has no choice but to assume these 
proteins to be active only when all their sites are phosphorylated. This is a 
harsh simplification. Modeling simplifications are usually done not only because 
detailed knowledge is missing, but also because corresponding models cannot be 
described, executed, or analyzed efficiently. Yet these simplifications are done 
without any clue of their impact on the behavior of the systems. By providing 
ways of describing and handling some complex details, we offer the modelers the 
means to incorporate these details and to test empirically their impact. 

Our framework is fully integrated within the Kappa modeling platform which 
is open-source and usable online (https://kappalanguage.org). It is worth noting 
that we have taken two radically different approaches to deal with counters in 
simulation and in static analysis. Encodings are good for simulation, but they 
tend to obfuscate the properties of interest, hence damaging drastically the capa- 
bility of the static analysis to infer useful properties about them. The extension of 
the categorical semantics provides a parsimonious definition of causality between 
computation steps, as well as means to reason symbolically on the behavior of 
the number of occurrences of patterns. For further works, we will extend exist- 
ing decision procedures [14,15] that compute minimal causal traces to cope with 
counters. It is very likely that a third approach will be required. We suggest to 
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use the traces obtained by simulation, then translate the counters in these traces 
thanks to equivalent sites, and apply existing decision procedures the traces that 
will be obtained this way. 
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Abstract. Big-step and small-step are two popular flavors of opera- 
tional semantics. Big-step is often seen as a more natural transcription 
of informal descriptions, as well as being more convenient for some appli- 
cations such as interpreter generation or optimization verification. Small- 
step allows reasoning about non-terminating computations, concurrency 
and interactions. It is also generally preferred for reasoning about type 
systems. Instead of having to manually specify equivalent semantics in 
both styles for different applications, it would be useful to choose one 
and derive the other in a systematic or, preferably, automatic way. 
Transformations of small-step semantics into big-step have been inves- 
tigated in various forms by Danvy and others. However, it appears that a 
corresponding transformation from big-step to small-step semantics has 
not had the same attention. We present a fully automated transformation 
that maps big-step evaluators written in direct style to their small-step 
counterparts. Many of the steps in the transformation, which include 
CPS-conversion, defunctionalisation, and various continuation manipu- 
lations, mirror those used by Danvy and his co-authors. For many stan- 
dard languages, including those with either call-by-value or call-by-need 
and those with state, the transformation produces small-step semantics 
that are close in style to handwritten ones. We evaluate the applicability 
and correctness of the approach on 20 languages with a range of features. 


Keywords: Structural operational semantics - Big-step semantics - 
Small-step semantics - Interpreters - Transformation - 
Continuation-passing style - Functional programming 


1 Introduction 


Operational semantics allow language designers to precisely and concisely spec- 
ify the meaning of programs. Such semantics support formal type soundness 
proofs [29], give rise (sometimes automatically) to simple interpreters [15,27] 
and debuggers [14], and document the correct behavior for compilers. There are 
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two popular approaches for defining operational semantics: big-step and small- 
step. Big-step semantics (also referred to as natural or evaluation semantics) 
relate initial program configurations directly to final results in one “big” evalu- 
ation step. In contrast, small-step semantics relate intermediate configurations 
consisting of the term currently being evaluated and auxiliary information. The 
initial configuration corresponds to the entire program, and the final result, if 
there is one, can be obtained by taking the transitive-reflexive closure of the 
small-step relation. Thus, computation progresses as a series of “small steps.” 

The two styles have different strengths and weaknesses, making them suitable 
for different purposes. For example, big-step semantics naturally correspond to 
definitional interpreters [23], meaning many big-step semantics can essentially 
be transliterated into a reasonably efficient interpreter in a functional language. 
Big-step semantics are also more convenient for verifying program optimizations 
and compilation — using big-step, semantic preservation can be verified (for ter- 
minating programs) by induction on the derivation [20, 22]. 

In contrast, small-step semantics are often better suited for stepping through 
the evaluation of an example program, and for devising a type system and prov- 
ing its soundness via the classic syntactic method using progress and preservation 
proofs [29]. As a result, researchers sometimes develop multiple semantic spec- 
ifications and then argue for their equivalence [3,20,21]. In an ideal situation, 
the specifier writes down a single specification and then derives the others. 

Approaches to deriving big-step semantics from a small-step variant have 
been investigated on multiple occasions, starting from semantics specified as 
either interpreters or rules [4,7,10,12,13]. An obvious question is: what about 
the reverse direction? 

This paper presents a systematic, mechanised transformation from a big-step 
interpreter into its small-step counterpart. The overall transformation consists 
of multiple stages performed on an interpreter written in a functional program- 
ming language. For the most part, the individual transformations are well known. 
The key steps in this transformation are to explicitly represent control flow as 
continuations, to defunctionalise these continuations to obtain a datatype of rei- 
fied continuations, to “tear off” recursive calls to the interpreter, and then to 
return the reified continuations, which represent the rest of the computation. 
This process effectively produces a stepping function. The remaining work con- 
sists of finding translations from the reified continuations to equivalent terms in 
the source language. If such a term cannot be found, we introduce a new term 
constructor. These new constructors correspond to the intermediate auxiliary 
forms commonly found in handwritten small-step definitions. 

We define the transformations on our evaluator definition language — an 
extension of A-calculus with call-by-value semantics. The language is untyped 
and, crucially, includes tagged values (variants) and a case analysis construct for 
building and analysing object language terms. Our algorithm takes as input a 
big-step interpreter written in this language in the usual style: a main function 
performing case analysis on a top-level term constructor and recursively call- 
ing itself or auxiliary functions. As output, we return the resulting small-step 
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interpreter which we can “pretty-print” as a set of small-step rules in the usual 
style. Hence our algorithm provides a fully automated path from a restricted 
class of big-step semantic specifications written as interpreters to corresponding 
small-step versions. 

To evaluate our algorithm, we have applied it to 20 different languages with 
various features, including languages based on call-by-name and call-by-value 
A-calculi, as well as a core imperative language. We extend these base languages 
with conditionals, loops, and exceptions. 

We make the following contributions: 


— We present a multi-stage, automated transformation that maps any deter- 
ministic big-step evaluator into a small-step counterpart. Section2 gives an 
overview of this process. Each stage in the transformation is performed on 
our evaluator definition language — an extended call-by-value A-calculus. 
Each stage in the transformation is familiar and principled. Section 4 gives a 
detailed description. 

— We have implemented the transformation process in Haskell and evaluate 
it on a suite of 20 representative languages in Section 5. We argue that the 
resulting small-step evaluation rules closely mirror what one would expect 
from a manually written small-step specification. 

— We observe that the same process with minimal modifications can be used to 
transform a big-step semantics into its pretty-big-step [6] counterpart. 


2 Overview 


In this section, we provide an overview of the transformation steps on a simple 
example language. The diagram in Fig. 1 shows the transformation pipeline. As 
the initial step, we first convert the input big-step evaluator into continuation- 
passing style (CPS). We limit the conversion to the eval function itself and leave all 
other functions in direct style. The resulting continuations take a value as input 
and advance the computation. In the generalization step, we modify these con- 
tinuations so that they take an arbitrary term and evaluate it to a value before 
continuing as before. With this modification, each continuation handles both the 
general non-value case and the value case itself. The next stage lifts a carefully cho- 
sen set of free variables as arguments to continuations, which allows us to define all 
of them at the same scope level. After generalization and argument lifting, we can 
invoke continuations directly to switch control, instead of passing them as argu- 
ments to the eval function. Next we defunctionalize the continuations, converting 
them into a set of tagged values together with an apply function capturing their 
meaning. This transformation enables the next step, in which we remove recursive 
tail-calls to apply. This allows us to interrupt the interpreter and make it return 
a continuation or a term: effectively, it yields a stepping function, which is the 
essence of a small-step semantics. The remainder of the pipeline converts contin- 
uations to terms, performs simplifications, and then converts the CPS evaluator 
back to direct style to obtain the final small-step interpreter. This interpreter can 
be pretty-printed as a set of small-step rules. 
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Fig. 1. Transformation overview 


Our example language is a A-calculus with call-by-value semantics. Fig. 2 
gives its syntax and big-step rules. We use environments to give meaning to 
variables. The only values in this language are closures, formed by packaging a 
A-abstraction with an environment. 


x € Var p E Env = Var —> Val 


p(x) =v 
ess doinee) pew parla) be 
| a F pt lam(z, e) |) clo(z, e, p) 
[app erca) pF e1 4 clo(z,e, p) pretw p[e m v] etv 


pt app(e1, e2) I v 


Fig. 2. Example: Call-by-value A-calculus, abstract syntax and big-step semantics 


We will now give a series of interpreters to illustrate the transformation pro- 
cess. We formally define the syntax of the meta-language in which we write these 
interpreters in Section 3, but we believe for readers familiar with functional pro- 
gramming the language is intuitive enough to not require a full explanation at this 
point. Shaded text highlights (often small) changes to subsequent interpreters. 


Big-Step Evaluator. We start with an interpreter corresponding directly to the 
big-step semantics given in Fig.2. We represent environments as functions — 
the empty environment returns an error for any variable. The body of the eval 
function consists of a pattern match on the top-level language term. Function 
abstractions are evaluated to closures by packaging them with the current envi- 
ronment. The only term that requires recursive calls to eval is application: both 
its arguments are evaluated in the current environment, and then its first argu- 
ment is pattern-matched against a closure, the body of which is then evaluated 
to a value in an extended environment using a third recursive call to eval. 
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let empty = Ax. error() in 


let update z v p = Az’. let zx’ = (== x 2’) in if rz’ then v else (p x’) in 


let rec eval e p = 
case e of { 
val(v) > v | 
var(z) — let v = (p x) in v | 
lam(z, e’) — clo(z, e’, p) | 
app(e:, e2) > 
let v; = (eval e; p) in 
let v2 = (eval eg p) in 
case v; of { 
clo(z, e', p') > 
let p” = (update x v2 p’) in 
let v = (eval e’ p”) in 
v 
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CPS Conversion. Our first transformation introduces a continuation argument 
to eval, capturing the “rest of the computation” [9, 26,28]. Instead of returning 
the resulting value directly, eval will pass it to the continuation. For our example 
we need to introduce three continuations — all of them in the case for app. The 
continuation kapp,; captures what remains to be done after evaluating the first 
argument of app, kapp, captures the computation remaining after evaluating the 
second argument, and kclo, the computation remaining after the closure body is 
fully evaluated. This final continuation simply applies the top-level continuation 
to the resulting value and might seem redundant; however, its utility will become 
apparent in the following step. Note that the CPS conversion is limited to the 


eval function, leaving any other functions in the program intact. 


let rec eval e p k = 
case e of { 
val(v) — (k v) | 
var(x) — let v = (p x) in (k v) | 
lam(z, e’) — (k clo(z, e’, p)) | 
app(ez, e2) = 
letcont kapp; vi = 
letcont kapp, v2 = 
case v; of { 
clo(z, e’, p') > 
let p” = (update x ve p') in 
letcont kclo; v = (k v) in 
(eval e' p” (Av. (kcloz v))) 
yin 
(eval eg p (Ave. (kapp, v2))) in 
(eval e, p (Avı. (kapp; v1))) 
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Generalization. Next, we modify the continuation definitions so that they handle 
both the case when the term is a value (the original case) and the case where it is 
still a term that needs to be evaluated. To achieve this goal, we introduce a case 
analysis on the input. If the continuation’s argument is a value, the evaluation 
will proceed as before. Otherwise it will call eval with itself as the continuation 
argument. Intuitively, the latter case will correspond to a congruence rule in the 
resulting small-step semantics and we refer to these as congruence cases in the 
rest of this paper. 


let rec eval e p k = case e of { 
val(v) — (k val(v)) | 
var(x) — let v = (p x) in (k val(v)) | 
lam(z, e’) — (k val(clo(z, e’, p))) | 
app(e:, e2) > 
letcont kapp; e1 = 
case e; of { 
val(v:) > 


case v; of { 
clo(z, e’, p') > 
let p” = (update x ve p') in 
letcont kclo; e = 
case e of { 
val(v) > (k val(v)) | 
ELSE(e) — (eval e p” (Ae’. (kcloz e’))) 
} in 
(eval e’ po” (Xv. (kelo: v))) 


ELSE(e,) — (eval e; p (Ae;. (kapp, e/))) 
} in 
(eval e, p (Avı. (kapp; v1))) 


} 


Argument Lifting. The free variables inside each continuation can be divided 
into those that depend on the top-level term and those that parameterize the 
evaluation. The former category contains variables dependent on subterms of 
the top-level term, either by standing for a subterm itself, or by being derived 
from it. In our example, for kapp,, it is the variable eg, i.e., the right argu- 
ment of app, for kapp», the variable vı as the value resulting from evaluating 
the left argument, and for kclo, it is the environment obtained by extending 
the closure’s environment by binding the closure variable to the operand value 
(p” derived from v2). We lift variables that fall into the first category, that is, 
variables derived from the input term. We leave variables that parametrize the 
evaluation, such as the input environment or the store, unlifted. The rationale 
is that, eventually, we want the continuations to act as term constructors and 
they need to carry information not contained in arguments passed to eval. 
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let rec eval e p k = case e of { 


app(ez, e2) + 
letcont kapp; e2 e1 = 


letcont kapp, v1 ez = 


letcont kclo1 p' e = 
case e of { 
val(v) — (k val(v)) | 
ELSE(e) — (eval e p' (Ae’. (kclo1 p" e’))) 
\ in 
(eval e’ o” (Xv. (kcloy p” v))) 
} | 
ELSE(e2) — (eval eg p (Aed. (Kappy v1 e2))) 
yin 
(eval eg p (Avg. (Kappy vı ve2))) | 
ELSE(e,) — (eval e; p (Aei. (kapp; e2 e/))) 
} in 
(eval e; p (Av;. (kapp; e2 1))) 
} 
Continuations Switch Control. Since continuations now handle the full evalu- 
ation of their argument themselves, they can be used to switch stages in the 
evaluation of a term. Observe how in the resulting evaluator below, the evalu- 
ation of an app term progresses through stages initiated by kapp,, kapp 2, and 
finally kcloı. 


let rec eval e p k = case e of { 


app(ez, e2) > 
letcont kapp; e2 e1 = 


letcont kapp, v1 e2 = 


letcont kclo; p' e = 


in (kclo; p” e’) 
in E | 


in (kapp; e2 e1) 
} 
Defunctionalization. In the next step, we defunctionalize continuations. For each 
continuation, we introduce a constructor with the corresponding number of 
arguments. The apply function gives the meaning of each defunctionalized 
continuation. 
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let rec apply eval ex p k = case ex of { 
kapp1(e2, e1) > 
case e; of { 
val(v;) — (apply eval kapp2(vz, e2) p k) | 
ELSE(e,) — (eval e; p (Ae;. (apply eval kapp1 (e2, e1) p k))) 
} | 
kapp2(v1,e2) > 
case ez of { 
val(vz) =} 
case v; of { 
clo(z, e', p’) > 
let p” = (update x ve p’) 
in (apply eval kclol(p”, e’) p k) 
} | 
ELSE(e2) — (eval e2 p (Aeg. (apply eval kapp2(v1, e2) p k))) 
} | 
kclol(p’,e) —> 
case e of { 
val(v) — (k val(v)) | 
ELSE(e) — (eval e p’ (Ae’. (apply eval kclol(p’, e’) p k))) 
} 
}in 
let rec eval e p k = case e of { 
val(v) — (k val(v)) | 
var(z) — let v = (p x) in (k val(v)) | 
lam(z, e’) — (k val(clo(z, e’, p))) | 
app(e:, e2) — (apply eval kapp1(ez, e1) p k) 


Remove Tail-Calls. We can now move from a recursive evaluator to a stepping 
function by modifying the continuation arguments passed to eval in congruence 
cases. Instead of calling apply on the defunctionalized continuation, we return 
the defunctionalized continuation itself. Note, that we leave intact those calls to 
apply that switch control between different continuations (e.g., in the definition 
of eval). 


let rec apply eval ex p k = case ex of { 
kapp1(e2,e:) > 
case e; of { 
val(v;) — (apply eval kapp2(v:, e2) p k) | 
ELSE(e;) — (eval e; p (Ae}. (k kapp1(es, e/)))) 
} | 
kapp2(v;, e2) > 
case ez of { 
val(vg) +... (apply eval kclo1(p”, e’) p k) | 
ELSE(ez) — (eval e2 p (Aes. (k kapp2(uz, e2)))) 
} | 


kclol(p’, e) > 
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case e of { 

val(v) — (k val(v)) | 

ELSE(e) — (eval e p’ (Ae’. (k kclol(p’, e’)) )) 
} 


yin... 


Convert Continuations into Terms. At this point, we have a stepping func- 
tion that returns either a term or a continuation, but we want a function 
returning only terms. The most straightforward approach to achieving this goal 
would be to introduce a term constructor for each defunctionalized continuation 
constructor. However, many of these continuation constructors can be trivially 
expressed using constructors already present in the object language. We want to 
avoid introducing redundant terms, so we aim to reuse existing constructors as 
much as possible. In our example we observe that kapp1l(e2,e1) corresponds to 
app(e1, e2), while kapp2(v1, e2) to app(val(v;),e2). We might also observe that 
kclo1(p’,e) would correspond to app(clo(z, e, p), val(ve)) if p' = update x ve p. 
Our current implementation doesn’t handle such cases, however, and so we intro- 
duce kclo1 as a new term constructor. 


let rec apply eval ep p k = case ex of { 
kapp1(e2, e1) > 
case e; of { 
val(v;) — (apply eval kapp2(v, e2) p k) | 
ELSE(e,) — (eval e; p (Aci. (k app(e}, e2) ))) 
J 
kapp2(v;, e2) > 
case ez of { 
val(ve) > 
case v; of { 
clo(z, e’, p’) — let p” = (update x v2 p’) in kclol(p”, e’) 
} | 
ELSE(e2) — (eval e2 p (Aep. (k app(val(vz), e2) ))) 
} | 
kclol(p’, e) > 
case e of { 
val(v) — (k val(v)) | 
ELSE(e) — (eval e p’ (Ae’. (k kclol(p’, e’) ))) 
} 
yin 
let rec eval e p k = case e of { 


kclo1(p’, e’) — (apply eval kclo1(p’, e’) p k) 


} 


Inlining and Simplification. Next, we eliminate the apply function by inlining 
its applications and simplifying the result. At this point we have obtained a 
small-step interpreter in continuation-passing style. 
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let rec eval e p k = case e of { 


app(er, e2) > 
case e; of { 
val(v, ) =% 
case ez of { 
val(v2) > 
case v; of { 
clo(z, e', p’) — let p” = (update x vz p’) in kclol(p”, e’) 


} | 
eran — (eval eg p (Aes. (k app(val(vz), e5)))) 
ELSE(e:) — (eval e; p (Ae}. (k app(e}, e2)))) 


} | 
kclol(p’, e’) > 
case e’ of { 
val(v) — (k val(v)) | 
ELSE(e) — (eval e p’ (Ae’. (k kclol(p’, e’)))) 
} 
} 


Convert to Direct Style and Remove the Value Case. The final transformation 
is to convert our small-step interpreter back to direct style. Moreover, we also 
remove the value case val(v) — val(v) as we, usually, do not want values to step. 


let rec eval e p = case e of { 
var(z) — let v = (p x) in val(v) | 
lam(z, e’) — val(clo(z, e’, p)) | 
apple, e2) > 
case e; of { 
val(v, ) —_> 
case ez of { 
val(ve) =r 
case v; of { 
clo(z, e', p’) — let p” = (update x v2 p’) in kclol(p”, e’) 
} | 
ELSE(e2) — let ep = (eval eg p) in app(val(vz), e2) 
} | 
ELSE(e,) — let e; = (eval e; p) in app(e’, e2) 
} | 
kclol(p’, e’) > 
case e’ of { 
val(v) — val(v) | 
ELSE(e) — let e’ = (eval e p’) in kclol(p’, e’) 
} 
} 


Small-Step Evaluator. Fig.3 shows the small-step rules corresponding to our 
last interpreter. Barring the introduction of the kclol constructor, the resulting 
semantics is essentially identical to one we would write manually. 
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v=pz 
2 
pF var(x) > val(v) pF lam(z, e’) > val(clo(z, e’, p)) 


p” = update x v2 p' 


p+ app(val(clo(z, e’, p’)), val(v2)) > kclol(p”, e’) 


4 pre > e pre-e; 
pF app(val(v;), c2) > app(val(v:),¢s) pF apple, c2) > applet, c2) 

7 pre se 
pt kelol(p’, val(v)) — val(v) p* kclol(p’,e) > kclol(p’, e’) 


Fig. 3. Resulting small-step semantics 


3 Big-Step Specifications 


We define our transformations on an untyped extended A-calculus with call-by- 
value semantics that allows the straightforward definition of big- and small-step 
interpreters. We call this language an evaluator definition language (EDL). 


3.1 Evaluator Definition Language 


Table 1 gives the syntax of EDL. We choose to restrict ourselves to A-normal 
form, which greatly simplifies our partial CPS conversion without compromising 
readability. Our language has the usual call-by-value semantics, with arguments 
being evaluated left-to-right. All of the examples of the previous section were 
written in this language. 

Our language has 3 forms of let-binding constructs: the usual (optionally 
recursive) let, a let-construct for evaluator definition, and a let-construct for 
defining continuations. The behavior of all three constructs is the same, however, 
we treat them differently during the transformations. The leteval construct also 
comes with the additional static restriction that it may appear only once (i.e., 
there can be only one evaluator). The leteval and letcont forms are recursive 
by default, while let has an optional rec specifier to create a recursive binding. 
For simplicity, our language does not offer implicit mutual recursion, so mutual 
recursion has to be made explicit by inserting additional arguments. We do this 
when we generate the apply function during defunctionalization. 


Notation and Presentation. We use vector notation to denote syntactic lists 
belonging to a particular sort. For example, é and aè are lists of elements of, 
respectively, Expr and AExpr, while Z is a list of variables. Separators can be 
spaces (e.g., function arguments) or commas (e.g., constructor arguments or 
configuration components). We expect the actual separator to be clear from the 
context. Similarly for lists of expressions: €, ae, etc. In let bindings, f £1 ... Ln = 
e and f = Ax, ... zn. e are both syntactic sugar for f = Axı. ... An. e. 
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Table 1. Syntax of the evaluator definition language. 


Expr > e ::= let bn = ce ine (let-binding) 
let rec bn = ce in e (recursive let-binding) 
leteval x = ce in e (evaluator definition) 
letcont k = ce in e (continuation definition) 
ce 
CExpr > ce ::= (ae ae...) (application) 
case ae of { cas |... | cas } (pattern matching) 
if ae then e else e (conditional) 
ae 
AEzpr>ae:=v | op (value, operator) 
z | k (variable, continuation variable) 
Abn. e (A-abstraction) 
c(ae, ..., ae) (constructor application) 
( ae, ..., ae ) (configuration expression) 
Binder>bnu=a | (a,...,2) (variable, configuration) 
Case > cas ::= c(a,..., 2) > e (constructor pattern) 
| ELSE(x) > e (default pattern) 


Value 3v ::=n | b | c(v,...v) | (v,...0) | abs(Az.e, p) 


4 Transformation Steps 


In this section, we formally define each of the transformation steps informally 
described in Section 2. For each transformation function, we list only the most 
relevant cases; the remaining cases trivially recurse on the A-normal form (ANF) 
abstract syntax. We annotate functions with E, CE, and AE to indicate the corre- 
sponding ANF syntactic classes. We omit annotations when a function only oper- 
ates on a single syntactic class. For readability, we annotate meta-variables to hint 
at their intended use — p stands for read-only entities (such as environments), 
whereas o stands for read-write or “state-like” entities of a configuration (e.g., 
stores or exception states). These can be mixed with our notation for syntactic 
lists, so, for example, %7 is a sequence of variables referring to state-like entities, 
while dé? is a sequence of a-expressions corresponding to read-only entities. 


4.1 CPS Conversion 


The first stage of the process is a partial CPS conversion [8,25] to make control 
flow in the evaluator explicit. We limit this transformation to the main evalu- 
ator function, i.e., only the function eval will take an additional continuation 
argument and will pass results to it. Because our input language is already in 
ANF, the conversion is relatively easy to express. In particular, applications of 
the evaluator are always let-bound to a variable (or appear in a tail position), 
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which makes constructing the current continuation straightforward. Below are 
the relevant clauses of the conversion. For this transformation we assume the 
following easily checkable properties: 


— The evaluator name is globally unique. 
— The evaluator is never applied partially. 
— All bound variables are distinct. 


The conversion is defined as three mutually recursive functions with the following 
signatures: 


cps_ : Expr — (CExpr > Expr) > Expr 
cpsog : CExpr — (CEupr — Expr) — Expr 
cps pg : AExzpr > AExpr 


In the equations, K, Z, A, : CExpr — Expr are meta-continuations; Z injects a 
CExpr into Expr. 


CPS 5 [leteval eval bn = e; in e2] K= 
leteval eval bn k = (cpsg [e1] Ax) in (cpsg [e2] K) 
where k is a fresh continuation variable 

CpSp {let bn = (eval ae, aè) in e] K = 
letcont k bn = (cpsp |e] K) in cpsop[(eval ae, de)] Ax 
where k is a fresh continuation variable 

cpsp [let bn = ce in e | K = 


renorm|let’ bn = (cpscg [ce] T) in (cpsg [e] K)] 


CPS ag [(eval ae, de)| K = (eval (cps4p [aei]) (cpsag [de]) (Ax. K[x])) 
where «x is a fresh variable 


CPSqg [ae] K = K(cpsyz [ae]) 


CPSA g [Ax.e] = Ax. (cpsz [e] T) 


CPS, p [ae] = ae 
where for any k, A, is defined as 
Ak [ae] =kae 
Ak [ce] = let x = cein kz where x is fresh 
and 
renorm [let’ x = ce in e] = let x = ce in e 
renorm|let’ x = (let 2’ = ce in e’) in e] = 


let x’ = ce in renorm [let’ z=e'in e] 


218 F. Vesely and K. Fisher 


In the above equations, let’ is a pseudo-construct used to make renormal- 
ization more readable. In essence, it is a non-ANF version of let where the 
bound expression is generalized to Expr. Note that renorm only works correctly 
if x Z fv(e), which is implied by our assumption that all bound variables are 
distinct. 


4.2 Generalization of Continuations 


The continuations resulting from the above CPS conversion expect to be applied 
to value terms. The next step is to generalize (or “lift” ) the continuations so that 
they recursively call the evaluator to evaluate non-value arguments. In other 
words, assuming the term type can be factored into values and computations 
V +C, we convert each continuation k with the type V — V into a continuation 
k’:V+C—V using the following schema: 


let rec k’ t = case t of inl v > k v | inrc— eval ck’ 


The recursive clauses will correspond to congruence rules in the resulting small- 
step semantics. 

The transformation works by finding the unique application site of the con- 
tinuation and then inserting the corresponding call to eval in the non-value case. 


gencont, |letcont k (x, Z7 ) = ex in e] = 
letcont k ( ĉ, z7 ) = 
case ĉ of { 
val(x) > ex 3 
ELSE(ĉ) — eval ( ĉ, ae ) ae? aek 
i 
if findApp k e = eval (_, ae” ) ae” aep 
where 


— findApp k e is the unique use site of the continuation k in expression e, that 
is, the CExpr where eval is applied with k as its continuation; and 

— ĉis a fresh variable associated with x — it stands for “a term corresponding 
to (the value) x”. 


Following the CPS conversion, each named continuation is applied exactly 
once in e, so findApp k eis total and returns the continuation’s unique use site. 
Moreover, because the continuation was originally defined and let-bound at that 
use site, all free variables in findApp k e are also free in the definition of k. 

When performing this generalization transformation, we also modify tail posi- 
tions in eval that return a value so that they wrap their result in the val con- 
structor. That is, if the continuation parameter of eval is k, then we rewrite all 
sites applying & to a configuration as follows: 


k (ae, ae’ ) > k ( val(ae), a” ) 
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4.3 Argument Lifting in Continuations 


In the next phase, we partially lift free variables in continuations to make them 
explicit arguments. We perform a selective lifting in that we avoid lifting non- 
term arguments to the evaluation function. These arguments represent entities 
that parameterize the evaluation of a term. If an entity is modified during evalua- 
tion, the modified entity variable gets lifted. In the running example of Section 2, 
such a lifting occurred for kelo. 

Function lift specifies the transformation at the continuation definition site: 


lift S A [letcont k = Ax. ex in e] = 
letcont k = Ax, ... £n x. (lift Z A’ [ex]) in (lift Z A’ [e]) 


- 5’ =U {k} 

= {@1,.-., En} = fv ek U (Uge(dom an fve,) A(9)) -E 
- A = A[|k > (a1,...,2n)] 

and at the continuation application site — recall that continuations are always 
applied fully, but at this point they are only applied to one argument: 


lift Z A [k ae] =k ti stna (lift SA [ae’]) 


if k € dom A and A(k) = (£1,..., n). 

Our lifting function is a restricted version of a standard argument-lifting 
algorithm [19]. The first restriction is that we do not lift all free variables, since 
we do not aim to float and lift the continuations to the top-level of the program, 
only to the top-level of the evaluation function. The other difference is that we 
can use a simpler way to compute the set of lifted parameters due to the absence 
of mutual recursion between continuations. The correctness of this can be proved 
using the approach of Fischbach [16]. 


$ 


4.4 Continuations Switch Control Directly 


At this point, continuations handle the full evaluation of a term themselves. 
Instead of calling eval with the continuation as an argument, we can call the 
continuation directly to switch control between evaluation stages of a term. We 
will replace original eval call sites with direct applications of the corresponding 
continuations. The recursive call to eval in congruence cases of continuations will 
be left untouched, as this is where the continuation’s argument will be evaluated 
to a value. Following from the continuation generalization transformation, this 
call to eval is with the same arguments as in the original site (which we are now 
replacing). In particular, the eval is invoked with the same aé? arguments in the 
continuation body as in the original call site. 


directcont g [letcont k = ce in e| K = 


letcont k = directcontog [ce] K in directconty [e] (K w {k}) 


directcontopg [eval ( ae, ae” ) ae? (Ay. k Z y)| K =k & (ae, ae” ) ifke Kk 
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4.5 Defunctionalization 


Now we can move towards a first-order representation of continuations which can 
be further converted into term constructions. We defunctionalize continuations 
by first collecting all continuations in eval, then introducing corresponding con- 
structors (the syntax), and finally generating an apply function (the semantics). 
The collection function accumulates continuation names and their definitions. 
At the same time it removes the definitions. 


collect z [letcont k = ce in e] = ({(k, ce’)} U Kee U Ke, e’) 


where (Kee, ce’) = collect cx [ce] 
(Ke, e’) = collect g |e] 


We reuse continuation names for constructors. The apply function is generated 
by simply generating a case analysis on the constructors and reusing the argu- 
ment names from the continuation function arguments. In addition to the defunc- 
tionalized continuations, the generated apply function will take the same argu- 
ments as eval. Because of the absence of mutual recursion in our meta-language, 
apply takes eval as an argument. 


genApply Z° £7 kop {(k1, Api 1 sve Dice Ciise (Ray Ada b Sa Does En) } = 
Aeval ( £k, 2° ) ZP ktop. 
case x; of { 
kı (p11, tee ,P1,i) => €1 3 
3 
kn(Pn,1,-- -Pn j) > en 


} 


Now we need a way to replace calls to continuations with corresponding calls to 
apply. For dé” and krop we use the arguments passed to eval or apply (depending 
on where we are replacing). 


replacegg|k aé, (ae, de”) | (Z°, ktop) = apply eval ( k(deg, ae), de? ) Z° kitop 


Finally, the complete defunctionalization is defined in terms of the above three 
functions. 


4.6 Remove Self-recursive Tail-Calls 


This is the transformation which converts a recursive evaluator into a stepping 
function. The transformation itself is very simple: we simply replace the self- 
recursive calls to apply in congruence cases. 


dereccn [eval ( ae, dé” ) de? (A( a’, Z” ). apply eval ( că (de, x’), £7’) ae” k)| = 
=p 


eval ( ae, de? ) de? (A( a’, Z Y. k (č (ae, x’), 2” Y) 
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Note, that we still leave those invocations of apply that serve to switch control 
through the stages of evaluation. Unless a continuation constructor will become 
a part of the output language, its application will be inlined in the final phase 
of our transformation. 


4.7 Convert Continuations to Terms 


After defunctionalization, we effectively have two sorts of terms: those con- 
structed using the original constructors and those constructed using continu- 
ation constructors. Terms in these two sorts are given their semantics by the 
eval and apply functions, respectively. To get only one evaluator function at 
the end of our transformation process, we will join these two sorts, adding 
extra continuation constructors as new term constructors. We could simply 
merge apply to eval, however, this would give us many overlapping construc- 
tors. For example, in Section 2, we established that kapp1(e2,e1) ~ app(e1, €2) 
and kapp2(v1, e2) © app(val(v;), e2). The inference of equivalent term construc- 
tors is guided by the following simple principle. For each continuation term 
cX(ae1,...,@€n) we are looking for a term c'(ae{,...,ae},,), such that, for all 


=> 


ae’, ae” and aek 


+ oO 


apply eval ( c*(ae1,..., aen), ae” ) ae? 


aek 
= eval ( c'(ae},..., ael), de” ) ae” aep 

In our current implementation, we use a conservative approach where, start- 
ing from the cases in eval, we search for continuations reachable along a control 
flow path. Variables appearing in the original term are instantiated along the 
way. Moreover, we collect variables dependent on configuration entities (state). 
If control flow is split based on information derived from the state, we auto- 
matically include any continuation constructors reachable from that point as 
new constructors in the resulting language and interpreter. This, together with 
how information flows from the top-level term to subterms in congruence cases, 
preserves the coupling between state and corresponding subterms between steps. 

If, starting from an input term c(7), an invocation of apply on a continuation 
term c*(aep) is reached, and if, after instantiating the variables in the input 
term c(aé), the sets of their free variables are equal, then we can introduce a 
translation from cX(dé;) into c(dé). If such a direct path is not found, the c* will 
become a new term constructor in the language and a case in eval is introduced 
such that the above equation is satisfied. 


4.8 Inlining, Simplification and Conversion to Direct Style 


To finalize the generation of a small-step interpreter, we inline all invocations 
of apply and simplify the final program. After this, the interpreter will con- 
sists of only the eval function, still in continuation-passing style. To convert the 
interpreter to direct style, we simply substitute eval’s continuation variable for 
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(Ax.x) and reduce the new redexes. Then we remove the continuation argument 
performing rewrites following the scheme: 


eval aè (Abn. e) = let bn = eval aè in e 


Finally, we remove the reflexive case on values (i.e., val(v) — val(v)). At this 
point we have a small-step interpreter in direct form. 


4.9 Removing Vacuous Continuations 


After performing the above transformation steps, we may end up with some 
redundant term constructors, which we call “empty” or vacuous. These are con- 
structors which only have one argument and their semantics is equivalent to 
the argument itself, save for an extra step which returns the computed value. In 
other words, they are unary constructs which only have two rules in the resulting 
small-step semantics matching the following pattern. 


pr (e,a) — (e’, a") 
pr (c(val(v)), ) > (val(v), 6) Pr (cle), &) > (ele), g’) 


Such a construct will result from a continuation, which, even after generaliza- 
tion and argument lifting, merely evaluates its sole argument and returns the 
corresponding value: 


letcont rec k; e = case e of { 
val(v) > k v | 
ELSE(e) — eval e (Ae’. k; e’) 
} 
These continuations can be easily identified and removed once argument lifting 
is performed, or at any point in the transformation pipeline, up until apply is 
absorbed into eval. 


4.10 Detour: Generating Pretty-Big-Step Semantics 


It is interesting to see what kind of semantics we get by rearranging or removing 
some steps of the above process. If, after CPS conversion, we do not general- 
ize the continuations, but instead just lift their arguments and defunctionalize 
them, we obtain a pretty-big-step [6] interpreter. The distinguishing feature of 
pretty-big-step semantics is that constructs which would normally have rules 
with multiple premises are factorized into intermediate constructs. As observed 
by Charguéraud, each intermediate construct corresponds to an intermediate 
state of the interpreter, which is why, in turn, they naturally correspond to 
continuations. Here are the pretty-big-step rules generated from the big-step 
semantics in Fig. 2 (Section 2). 


1 The complete transformation to pretty-big-step style involves these steps: 1. CPS 
conversion, 2. argument lifting, 3. removal of vacuous continuations, 4. defunction- 
alization, 5. merging of apply and eval, and 6. conversion to direct style. 
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př er Ẹ}r vı př kappl(es, v1) Yp v 
pF val(v) Jh v pF app(er, e2) Ip v 
v=prT pF e2 Jus pt kapp2(v:, vz) Lz v 
pk var(r) ev pt kappl(e2, vz) La v 


p” = update z ve pp" Fe’ Inv 
pt lam(z, e’) 5 clo(z, e’, p) pt kapp2(clo(z, e’, p’), v2) Ys v 


As we can see, the evaluation of app now proceeds through two intermediate 
constructs, kappl and kapp2, which correspond to continuations introduced in 
the CPS conversion. The evaluation of app(e1, 2) starts by evaluating e, to 
vı. Then kapp1 is responsible for evaluating e2 to v2. Finally, kapp2 evaluates 
the closure body just as the third premise of the original rule for app. Save 
for different order of arguments, the resulting intermediate constructs and their 
rules are identical to Charguéraud’s examples. 


4.11 Pretty-Printing 


For the purpose of presenting and studying the original and transformed seman- 
tics, we add a final pretty-printing phase. This amounts to generating inference 
rules corresponding to the control flow in the interpreter. This pretty-printing 
stage can be applied to both the big-step and small-step interpreters and was 
used to generate many of the rules in this paper, as well as for generating the 
appendix of the full version of this paper [1]. 


4.12 Correctness 


A correctness proof for the full pipeline is not part of our current work. How- 
ever, several of these steps (partial CPS conversion, partial argument lifting, 
defunctionalization, conversion to direct style) are instances of well-established 
techniques. In other cases, such as generalization of continuations (Section 4.2) 
and removal of self-recursive tail-calls (Section 4.6), we have informal proofs using 
equational reasoning [1]. The proof for tail-call removal is currently restricted to 
compositional interpreters. 


5 Evaluation 


We have evaluated our approach to deriving small-step interpreters on a range 
of example languages. Table 2 presents an overview of example big-step specifi- 
cations and their properties, together with their derived small-step counterparts. 
A full listing of the input and output specifications for these case studies appears 
in the appendix to the full version of the paper, which is available online [1]. 


224 F. Vesely and K. Fisher 


Table 2. Overview of transformed example languages. Input is a given big-step inter- 
preter and our transformations produce a small-step counterpart as output automati- 
cally. “Prems” columns only list structural premises: those that check for a big or small 
step. Unless otherwise stated, environments are used to give meaning to variables and 
they are represented as functions. 


Big-step Small-step 

Example Rules|Prems|Rules|Prems|New|Features 

Call-by-value 4 3 7 3 1 

Call-by-value, substitution 4 5 T 4 Oļaddition 

Call-by-value, booleans 13 20 24 11 1ļadd., conditional, equality 

Call-by-value, pairs 7 T 14 T 1|pairs, left/right projection 

Call-by-value, dynamic 5 5 10 5 lladd., defunctionalized 

scopes environments (DEs) 

Call-by-value, recursion & 26 44 57 26 6|fixpoint operator, add., 

iteration sub., let-expressions, 
applicative for and while 
loops, cond., strict and 
“lazy” conjunction, eq., 
pairs 

Call-by-name 5 5 11 5 2|add., DEs 

Call-by-name, substitution 4 4 6 3 Oļadd., DEs 

Call-by-name, booleans 13 20 25 11 2|add., cond., eq., DEs 

Call-by-name, pairs T 7 15 7 2|pairs, left/right proj., DEs 

Minimal imperative 4 4 6 3 Ojadd., store without 
indirection, combined 
assignment with 
sequencing 

While 7 9} 14 6| 2ļ|add., store w/o indir., 
assign., seq., while 

While, environments 8 10 17 T 3ļadd., store w/ indir., 
scoped var. declaration, 
assign., seq., while 

Extended While 17 26 33 15 2|add., subt., mult., seq., 
store w/o indir., while, 
cond., “ints as bools”, 
equality, “lazy conj.” 

Exceptions as state 8 T 11 3 Ljadd. 

Exceptions as values 8 7 10 3 Ojadd. 

Call-by-value, exceptions 21 29 34 12 2|add., div., try block 

CBV, exceptions as state 20 26 39 11 8jadd., div., handle & try 
blocks 

CBV, non-determinism 7 7 13 5 2|add., choice operator 

Store rewinding 8 10 19 8 Alassign., rewinding of the 
store 
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For our case studies, we have used call-by-value and call-by-name -calculi, 
and a simple imperative language as base languages and extended them with 
some common features. Overall, the small-step specifications (as well as the cor- 
responding interpreters) resulting from our transformation are very similar to 
ones we could find in the literature. The differences are either well justified—for 
example, by different handling of value terms—or they are due to new term con- 
structors which could be potentially eliminated by a more powerful translation. 

We evaluated the correctness of our transformation experimentally, by com- 
paring runs of the original big-step and the transformed small-step interpreters, 
as well as by inspecting the interpreters themselves. In a few cases, we proved 
the transformation correct by transcribing the input and output interpreters in 
Coq (as an evaluation relation coupled with a proof of determinism) and proving 
them equivalent. From the examples in Table2, we have done so for “Call-by- 
value”, “Exceptions as state”, and a simplified version of “CBV, exceptions as 
state”. 

We make a few observations about the resulting semantics here. 


New Aucziliary Constructs. In languages that use an environment to look up 
values bound to variables, new constructs are introduced to keep the updated 
environment as context. These constructs are simple: they have two arguments — 
one for the environment (context) and one for the term to be evaluated in that 
environment. A congruence rule will ensure steps of the term argument in the 
given context and another rule will return the result. The construct kclol from 
the A-calculus based examples is a typical example. 


pert 
p} kclo1(p’, val(v)) — val(v) pF kclol(p’,t) — kclo1(p’, t’) 


As observed in Section 2, if the environment p” is a result of updating an envi- 
ronment p’ with a binding of x to v, then the app rule 


p” = update x v p’ 


pt app(clo(p’, x, e),v) = kelol(p”, e) 


and the above two rules can be replaced with the following rules for app: 


p” = update z v2 p) p’Fee’ 


p F app(clo(z, v, p"), v2) — v pt app(clo(x, e, p'), v2) — app(clo(x, e', p’), v2) 


Another common type of constructs resulting in a recurring pattern of extra 
auxiliary constructs are loops. For example, the “While” language listed in 
Table 2 contains a while-loop with the following big-step rules: 

(e, 0) 4 (false, 0”) 
(while(ez,c), 7) 4 (skip, o”) 


(ey, 0) |} (true,o’) (c, 0") 4 (skip,o”)  (while(es, c), o”) 4 w, o”) 
(while(ez, c), 7) 4 w, o”) 
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The automatic transformation of these rules introduces two extra constructs, 
kwhilel and ktruel. The former ensures the full evaluation of the condition 
expression, keeping a copy of it together with the while’s body. The latter con- 
struct ensures the full evaluation of while’s body, keeping a copy of the body 
together with the condition expression. 


(while(e,,c), o) — (kwhilel(c, es, eb), 7) 


(kwhilel (c, e», true), o) — (ktruel (es, c, c), o) 


(kwhilel1(c, e», false), o) — (skip, o) 


(t, o) — (t', o’) 
(kwhilel(c, e»,t), 7) — (kwhilel(c, eb, t’), o’) 


(ktruel (e, c, skip), o) — (while(e,, c), o) 


(t, o) — (t', o’) 
(ktruel(e,,c,t), o) — (ktruel (es, c, t), o”) 


We observe that in a language with a conditional and a sequencing construct 
we can find terms corresponding to kwhilel and ktruel: 


kwhile1 (c, ep, ep) ~ if(e,, seq(c, while(es, c)), skip) 
ktruel (es, c, c’) & seq(c', while(ez, c)) 


The small-step semantics of while could then be simplified to a single rule. 


(while(e,,c), 7) — (if(e,, seq(c, while(e,, c)), skip), o) 


Our current, straightforward way of deriving term—continuation equivalents 
is not capable of finding these equivalences. In future work, we want to explore 
external tools, such as SMT solvers, to facilitate searching for translations from 
continuations to terms. This search could be possibly limited to a specific term 
depth. 


Exceptions as Values. We tested our transformations with two ways of represent- 
ing exceptions in big-step semantics currently supported by our input language: 
as values and as state. Representing exceptions as values appears to be more 
common and is used, for example, in the big-step specification of Standard ML 
[24], or in [6] in connection with pretty big-step semantics. Given a big-step spec- 
ification (or interpreter) in this style, the generated small-step semantics handles 
exceptions correctly (based on our experiments). However, since exceptions are 
just values, propagation to top-level is spread out across multiple steps — depend- 
ing on the depth of the term which raised the exception. The following example 
illustrates this behavior. 


add(1, add(2, add(raise(3), raise(4)))) — add(1, add(2, add(exc(3), raise(4)))) 
— add(1, add(2, exc(3))) — add(1, exc(3)) — exc(3) 
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Since we expect the input semantics to be deterministic and the propagation 
of exceptions in the resulting small-step follows the original big-step semantics, 
this “slow” propagation is not a problem, even if it does not take advantage of 
“fast” propagation via labels or state. A possible solution we are considering for 
future work is to let the user flag values in the big-step semantics and translate 
such values as labels on arrows or a state change to allow propagating them in 
a single step. 


Exceptions as State. Another approach to specifying exceptions is to use a flag 
in the configuration. Rules may be specified so that they only apply if the incom- 
ing state has no exception indicated. As with the exceptions-as-values approach, 
propagation rules have to be written to terminate a computation early if a com- 
putation of a subterm indicates an exception. Observe the exception propagation 
rule for add and the exception handling rule for try. 


(e1,0, 0k) |) (vi, 0’, ex) 


( 

(app(e1, e2), 7, ok) J (skip, o’, ex) 
( 
) 


e2,0', 0k) |) (v2, 0”, ok) 
{ (v2, 0”, ok) 


(e1, 0, 0k) | (v1, 0’, ex) 


(try(e1, e2), 0, ok 


Using state to propagate exceptions is mentioned in connection with small- 
step SOS in [4]. While this approach has the potential advantage of manifesting 
the currently raised exception immediately at the top-level, it also poses a prob- 
lem of locality. If an exception is reinserted into the configuration, it might 
become decoupled from the original site. This can result, for example, in the 
wrong handler catching the exception in a following step. Our transformation 
deals with this style of exceptions naturally by preserving more continuations 
in the final interpreter. After being raised, an exception is inserted into the 
state and propagated to top-level by congruence rules. However, it will only be 
caught after the corresponding subterm has been evaluated, or rather, a value 
has been propagated upwards to signal a completed computation. This behavior 
corresponds to exception handling in big-step rules, only it is spread out over 
multiple steps. Continuations are kept in the final language to correspond to 
stages of computation and thus, to preserve the locality of a raised exception. 
A handler will only handle an exception once the raising subterm has become a 
value. Hence, the exception will be intercepted by the innermost handler — even 
if the exception is visible at the top-level of a step. 

Based on our experiments, the exception-as-state handling in the generated 
small-step interpreters is a truthful unfolding of the big-step evaluation process. 
This is further supported by our ad-hoc proofs of equivalence between input and 
output interpreters. However, the generated semantics suffers from a blowup in 
the number of rules and moves away from the usual small-step propagation and 
exception handling in congruence rules. We see this as a shortcoming of the trans- 
formation. To overcome this, we briefly experimented with a case-floating stage, 
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which would result in catching exceptions in the congruence cases of continu- 
ations. Using such transformation, the resulting interpreter would more closely 
mirror the standard small-step treatment of exceptions as signals. However, the 
conditions when this transformations should be triggered need to be considered 
carefully and we leave this for future work. 


Limited Non-determinism. In the present work, our aim was to only consider 
deterministic semantics implemented as an interpreter in a functional program- 
ming language. However, since cases of the interpreter are considered indepen- 
dently in the transformation, some forms of non-determinism in the input seman- 
tics get translated correctly. For example, the following internal choice construct 
(cf. CSP’s N operator [5,17]) gets transformed correctly. The straightforward 
big-step rules are transformed into small-step rules as expected. Of course, one 
has to keep in mind that these rules are interpreted as ordered, that is, the first 
rule in both styles will always apply. 


el a, U1 

choose(e1, e2) | v1 choose(e1, e2) —> e1 
e2 a, V2 

choose(e1, e2) |) ve choose(e1, e2) — e2 


6 Related Work 


In their short paper [18], the authors propose a direct syntactic way of deriving 
small-step rules from big-step ones. Unlike our approach, based on manipulating 
control flow in an interpreter, their transformation applies to a set of inference 
rules. While axioms are copied over directly, for conditional rules a stack is 
added to the configuration to keep track of evaluation. For each conditional big- 
step rule, an auxiliary construct and 4 small-step rules are generated. Results of 
“premise computations” are accumulated and side-conditions are only discharged 
at the end of such a computation sequence. For this reason, we can view the 
resulting semantics more as a “leap” semantics, which makes it less suitable for 
a semantics-based interpreter or debugger. A further disadvantage is that the 
resulting semantics is far removed from a typical small-step specification with a 
higher potential for blow-up as 4 rules are introduced for each conditional rule. 
On the other hand, the delayed unification of meta-variables and discharging of 
side-conditions potentially makes the transformation applicable to a wider array 
of languages, including those where control flow is not as explicit. 

In [2], the author explores an approach to constructing abstract machines 
from big-step (natural) specifications. It applies to a class of big-step specifica- 
tions called L-attributed big-step semantics, which allows for sufficiently inter- 
esting languages. The extracted abstract machines use a stack of evaluation 
contexts to keep track of the stages of computations. In contrast, our trans- 
formed interpreters rebuild the context via congruence rules in each step. While 
this is less efficient as a computation strategy, the intermediate results of the 
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computation are visible in the context of the original program, in line with usual 
SOS specifications. 

A significant body of work has been developed on transformations that take 
a form of small-step semantics (usually an interpreter) and produce a big-step- 
style interpreter. The relation between semantic specifications, interpreters and 
abstract machines has been thoroughly investigated, mainly in the context of 
reduction semantics [10—13,26]. In particular, our work was inspired by and is 
based on Danvy’s work on refocusing in reduction semantics [13] and on use of 
CPS conversion and defunctionalization to convert between representations of 
control in interpreters [11]. 

A more direct approach to deriving big-step semantics from small-step is 
taken by authors of [4], where a small-step Modular SOS specification is trans- 
formed into a pretty-big-step one. This is done by introducing reflexivity and 
transitivity rules into a specification, along with a “refocus” rule which effectively 
compresses a transition sequence into a single step. The original small-step rules 
are then specialized with respect to these new rules, yielding refocused rules in 
the style of pretty-big-step semantics [6]. A related approach is by Ciobaca [7], 
where big-step rules are generated for a small-step semantics. The big-step rules 
are, again, close to a pretty-big-step style. 


7 Conclusion and Future Work 


We have presented a stepwise functional derivation of a small-step interpreter 
from a big-step one. This derivation proceeds through a sequence of, mostly 
basic, transformation steps. First, the big-step evaluation function is converted 
into continuation-passing style to make control-flow explicit. Then, the contin- 
uations are generalized (or lifted) to handle non-value inputs. The non-value 
cases correspond to congruence rules in small-step semantics. After defunction- 
alization, we remove self-recursive calls, effectively converting the recursive inter- 
preter into a stepping function. The final major step of the transformation is to 
decide which continuations will have to be introduced as new auxiliary terms 
into the language. We have evaluated our approach on several languages cov- 
ering different features. For most of these, the transformation yields small-step 
semantics which are close to ones we would normally write by hand. 

We see this work as an initial exploration of automatic transformations of big- 
step semantics into small-step counterparts. We identified a few areas where the 
current process could be significantly improved. These include applying better 
equational reasoning to identify terms equivalent to continuations, or transform- 
ing exceptions as state in a way that would avoid introducing many intermediate 
terms and would better correspond to usual signal handling in small-step SOS. 
Another research avenue is to fully verify the transformations in an interactive 
theorem prover, with the possibility of extracting a correct transformer from the 
proofs. 
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Abstract. Traditionally, reasoning about programs under varying eval- 
uation regimes (call-by-value, call-by-name etc.) was done at the meta- 
level, treating them as term rewriting systems. Levy’s call-by-push-value 
(CBPV) calculus provides a more powerful approach for reasoning, by 
treating CBPV terms as a common intermediate language which captures 
both call-by-value and call-by-name, and by allowing equational reason- 
ing about changes to evaluation order between or within programs. 

We extend CBPV to additionally deal with call-by-need, which is non- 
trivial because of shared reductions. This allows the equational reasoning 
to also support call-by-need. As an example, we then prove that call- 
by-need and call-by-name are equivalent if nontermination is the only 
side-effect in the source language. 

We then show how to incorporate an effect system. This enables us to 
exploit static knowledge of the potential effects of a given expression to 
augment equational reasoning; thus a program fragment might be invari- 
ant under change of evaluation regime only because of knowledge of its 
effects. 


Keywords: Evaluation order - Call-by-need - Call-by-push-value - 
Logical relations - Effect systems 


1 Introduction 


Programming languages based on the A-calculus have different semantics 
depending on the reduction strategy employed. Three common variants are call- 
by-value, call-by-name and call-by-need (with the third sometimes also referred 
to as “lazy evaluation” when data constructors defer evaluation of arguments 
until the data structure is traversed). Reasoning about such programs and their 
equivalence under varying reduction strategies can be difficult as we have to 
reason about meta-level reduction strategies and not merely at the object level. 

Levy [17] introduced call-by-push-value (CBPV) to improve the situation. 
CBPV is a calculus with separated notions of value and computation. A charac- 
teristic feature is that each CBPV program encodes its own evaluation order. It is 
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best seen as an intermediate language into which lambda-calculus-based source- 
language programs can be translated. Moreover, CBPV is powerful enough that 
programs employing call-by-value or call-by-name (or even a mixture) can be 
simply translated into it, giving an object-calculus way to reason about the 
meta-level concept of reduction order. 

However, CBPV does not enable us to reason about call-by-need evaluation. 
An intuitive reason is that call-by-need has “action at a distance” in that reduc- 
tion of one subterm causes reduction of all other subterms that originated as 
copies during variable substitution. Indeed call-by-need is often framed using 
mutable stores (graph reduction [32], or reducing a thunk which is accessed by 
multiple pointers [16]). CBPV does not allow these to be encoded. 

This work presents extended call-by-push-value (ECBPV), a calculus sim- 
ilar to CBPV, but which can capture call-by-need reduction in addition to 
call-by-value and call-by-name. Specifically, ECBPV adds an extra primitive 
M needx. N which runs N, with M being evaluated the first time x is used. On 
subsequent uses of x, the result of the first run is returned immediately. The term 
M is evaluated at most once. We give the syntax and type system of ECBPV, 
together with an equational theory that expresses when terms are considered 
equal. 

A key justification for an intermediate language that can express several 
evaluation orders is that it enables equivalences between the evaluation orders 
to be proved. If there are no (side-)effects at all in the source language, then 
call-by-need, call-by-value and call-by-name should be semantically equivalent. 
If the only effect is nondeterminism, then need and value (but not name) are 
equivalent. If the only effect is nontermination then need and name (but not 
value) are equivalent. We show that ECBPV can be used to prove such equiva- 
lences by proving the latter using an argument based on Kripke logical relations 
of varying arity [12]. 

These equivalences rely on the language being restricted to particular effects. 
However, one may wish to switch evaluation order for subprograms restricted to 
particular effects, even if the language itself does not have such a restriction. 
To allow reasoning to be applied to these cases, we add an effect system [20] to 
ECBPV, which allows the side-effects of subprograms to be statically estimated. 
This allows us to determine which parts of a program are invariant under changes 
in evaluation order. As we will see, support for call-by-need (and action at a 
distance more generally) makes describing an effect system significantly more 
difficult than for call-by-value. 


Contributions. We make the following contributions: 


— We describe extended call-by-push-value, a version of CBPV containing an 
extra construct that adds support for call-by-need. We give its syntax, type 
system, and equational theory (Sect. 2). 

— We describe two translations from a lambda-calculus source language into 
ECBPV: one for call-by-name and one for call-by-need (the first such transla- 
tion) (Sect. 3). We then show that, if the source language has nontermination 
as the only effect, call-by-name and call-by-need are equivalent. 


Extended Call-by-Push- Value 237 


— We refine the type system of ECBPV so that its types also carry effect infor- 
mation (Sect. 4). This allows equivalences between evaluation orders to be 
exploited, both at ECBPV and source level, when subprograms are statically 
limited to particular effects. 


2 Extended Call-by-Push-Value 


We describe an extension to call-by-push-value with support for call-by-need. 
The primary difference between ordinary CBPV and ECBPV is the addition 
of a primitive that allows computations to be added to the environment, so 
that they are evaluated only the first time they are used. Before describing this 
change, we take a closer look at CBPV and how it supports call-by-value and 
call-by-name. 

CBPV stratifies terms into values, which do not have side-effects, and 
computations, which might. Evaluation order is irrelevant for values, so we are 
only concerned with how computations are sequenced. There is exactly one prim- 
itive that causes the evaluation of more than one computation, which is the 
computation M tox. N. This means run the computation M, bind the result to 
x, and then run the computation N. (It is similar to M >>= \x -> Nin Haskell.) 
The evaluation order is fixed: M is always eagerly evaluated. This construct can 
be used to implement call-by-value: to apply a function, eagerly evaluate the 
argument and then evaluate the body of the function. No other constructs cause 
the evaluation of more than one computation. 

To allow more control over evaluation order, CBPV allows computations 
to be thunked. The term thunk M is a value that contains the thunk of the 
computation M. Thunks can be duplicated (to allow a single computation to be 
evaluated more than once), and can be converted back into computations with 
force V. This allows call-by-name to be implemented: arguments to functions 
are thunked computations. Arguments are used by forcing them, so that the 
computation is evaluated every time the argument is used. Effectively, there is 
a construct M name x. N, which evaluates M each time the variable x is used 
by N, rather than eagerly evaluating. (The variable x is underlined here to 
indicate that it refers to a computation rather than a value: uses of it may have 
side-effects. ) 

To support call-by-need, extended call-by-push-value adds another construct 
M needa. N. This term runs the computation N, with the computation M being 
evaluated the first time x is used. On subsequent uses of x, the result of the first 
run is returned immediately. The computation M is evaluated at most once. This 
new construct adds the “action at a distance” missing from ordinary CBPV. 

We briefly mention that adding general mutable references to call-by-push- 
value would allow call-by-need to be encoded. However, reasoning about evalu- 
ation order would be difficult, and so we do not take this option. 
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2.1 Syntax 


The syntax of extended call-by-push-value is given in Fig. 1. The highlighted 
parts are new here. The rest of the syntax is similar to CBPV.+ 


V,W := c| x | (V1, V2) | fstV | sndV | inlV | inr V 
| case V of {inl x. W1, inr y. W2} | thunk M 
M,N ::= £ | force V | Mi. Mi}ier | “M | Ax. M | V‘M | return V 


| M toz. N | M need z. N 
A, B ::= unit | Ai x A | Aı +42: | UC 
C, D :=Tlier Q| 4> | FrA 
I :=0|T,x:A]| E 


Fig. 1. Syntax of ECBPV 


We assume two sets of variables: value variables x,y,... and computation 
variables x,y,.... While ordinary CBPV does not include computation variables, 
they do not of themselves add any expressive power to the calculus. The ability 
to use call-by-need in ECBPV comes from the need construct used to bind the 
variable.” 

There are two kinds of terms, value terms V,W which do not have side-effects 
(in particular, are strongly normalizing), and computation terms M,N which 
might have side-effects. Value terms include constants c, and specifically the 
constant () of type unit. There are no constant computation terms; value con- 
stants suffice (see Sect. 3 for an example). The value term thunk M suspends the 
computation M; the computation term force V runs the suspended computation 
V. Computation terms also include I-ary tuples A{i. Mi}ier (where I ranges over 
finite sets); the ith projection of a tuple M is i‘ M. Functions send values to com- 
putations, and are computations themselves. Application is written V‘M, where 
V is the argument and M is the function to apply. The term return V is a com- 
putation that just returns the value V, without causing any side-effects. Eager 
sequencing of computations is given by M to x. N, which evaluates M until it 
returns a value, then places the result in x and evaluates N. For example, in 
M toz. return (x, x), the term M is evaluated once, and the result is duplicated. 
In M to zx. return(), the term M is still evaluated once, but its result is never 


1 The only difference is that eliminators of product and sum types are value terms 
rather than computation terms (which makes value terms slightly more general). 
Levy [17] calls this CBPV with complex values. 

? Computation variables are not strictly required to support call-by-need (since we can 
use x : U (Fr A) instead of x : Fr A), but they simplify reasoning about evaluation 
order, and therefore we choose to include them. 
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used. Syntactically, both to and need (explained below) are right-associative (so 
M; to x. Mz to y. M3 means M; to x. (Mə to y. M3)). 

The primary new construct is M needa. N. This term evaluates N. The first 
time x is evaluated (due to a use of x inside N) it behaves the same as the 
computation M. If M returns a value V, then subsequent uses of x behave the 
same as return V. Hence only the first use of x will evaluate M. If x is not used 
then M is not evaluated at all. The computation variable x bound inside the 
term is primarily used by eagerly sequencing it with other computations. For 
example, 


M need z. x to y. x to z. return (y, z) 


uses x twice: once where the result is bound to y, and once where the result is 
bound to z. Only the first of these uses will evaluate M, so this term has the 
same semantics as M to x. return(x, x). The term M need x. return () does not 
evaluate M at all, and has the same semantics as return (). 

With the addition of need it is not in general possible to determine the order 
in which computations are executed statically. Uses of computation variables 
are given statically, but not all of these actually evaluate the corresponding 
computation dynamically. In general, the set of uses of computation variables 
that actually cause effects depends on run-time behaviour. This will be important 
when describing the effect system in Sect. 4. 

The standard capture-avoiding substitution of value variables in value terms 
is denoted V[z +> W]. We similarly have substitutions of value variables in com- 
putation terms, computation variables in value terms, and computation variables 
in computation terms. Finally, we define the call-by-name construct mentioned 
above as syntactic sugar for other CBPV primitives: 


Mnamex.N := thunk M ‘dy. N[x + force y] 


where y is not free in N. 

Types are stratified into value types A, B and computation types C, D. Value 
types include the unit type, products and sum types. (It is easy to add further 
base types; we omit Levy’s empty types for simplicity.) Value types also include 
thunk types UC, which are introduced by thunk M and eliminated by force V. 
Computation types include J-ary product types [],-<; C; for finite J, function 
types A — C, and returner types Fr A. The latter are introduced by return V, 
and are the only types of computation that can appear on the left of either 
to or need (which are the eliminators of returner types). The type constructors 
U and Fr form an adjunction in categorical models. Finally, contexts I’ map 
value variables to value types, and computation variables to computation types 
of the form Fr A. This restriction is due to the fact that the only construct 
that binds computation variables is need, which only sequences computations of 
returner type. Allowing computation variables to be associated with other forms 
of computation type in typing contexts is therefore unnecessary. Typing contexts 
are ordered lists. 

The syntax is parameterized by a signature, containing the constants c. 
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Definition 1 (Signature). A signature K consists of a set Ka of constants of 
type A for each value type A. All signatures contain () E€ Kunit- 


2.2 Type System 


The type system of extended call-by-push-value is a minor extension of the type 
system of ordinary call-by-push-value. Assume a fixed signature K. There are 
two typing judgements, one for value types and one for computation types. The 
rules for the value typing judgement I +, V : A and the computation typing 
judgement I’ M : C are given in Fig.2. Rules that add a new variable to 
the typing context implicitly require that the variable does not already appear 
in the context. The type system admits the usual weakening and substitution 
properties for both value and computation variables. 


TRVA 
——~ if(æ:A)cr = sex ee 
fame NS Tiea “ana T F, thunk M : UC 
Th, V:A Irk V: A2 Thy V: Ai x Ao Thy, V: Aı x Ag 
D Fẹ (Vi, V2) : Ai x A2 TK, fst V : Ay I Fy sndV : Ao 
Thy V:A T sV Ae 
I Hin V : Ay + Ao I Hy inr V : Ay + Ao 


Th, V:A,+ Ag T, x: Ai Fy Wi: B T, x: A Fy Wo : B 
I Fy case V of {inl x. Wi, inry. W2} : B 


r-M:C 
ThyV:A ThyV:UC 
——_— if(a:FrA)eT 
IP \ ee ETA TF return V :FrA IT F force V : C 
(T H Mi: C,)ier PEM :]]e; C: 
DE Mi. Mi}ier : Ther C: PeEiM:C, 
Tr AFM: © ThH,V:A FFM:A>+C 
rAr. M:A>C r-VM:C 
rA-M:FrA Tz: AFN:C T METETA o PARNE 
rA-Mtosr.N:C Tr- M needgz.N:C 


Fig. 2. Typing rules for ECBPV 
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It should be clear that ECBPV is actually an extension of call-by-push-value. 
CBPV terms embed as terms that never use the highlighted forms. We translate 
call-by-need by encoding call-by-need functions as terms of the form 


Ax’. (force x’) need x. M 


where x’ is not free in M. This is a call-by-push-value function that accepts a 
thunk as an argument. The thunk is added to the context, and the body of the 
function is executed. The first time the argument is used (via x), the computation 
inside the thunk is evaluated. Subsequent uses do not run the computation again. 
A translation based on this idea from a call-by-need source language is given in 
detail in Sect. 3.2. 


2.3 Equational Theory 


In this section, we present the equational theory of extended call-by-push-value. 
This is an extension of the equational theory for CBPV given by Levy [17] to 
support our new constructs. It consists of two judgement forms, one for values 
and one for computations: 


rHV=W:A PFEME=N:C 


These mean both terms are well typed, and are considered equal by the equa- 
tional theory. We frequently omit the context and type when they are obvious 
or unimportant. 

The definition is given by the axioms in Fig. 3. Note that these axioms only 
hold when the terms they mention have suitable types, and when suitable con- 
straints on free variables are satisfied. For example, the second sequencing axiom 
holds only if x is not free in N. These conditions are left implicit in the figure. 
The judgements are additionally reflexive (assuming the typing holds), symmet- 
ric and transitive. They are also closed under all possible congruence rules. There 
are no restrictions on congruence related to evaluation order. None are neces- 
sary because ECBPV terms make the evaluation order explicit: all sequencing of 
computations uses to and need. Finally, note that enriching the signature with 
additional constants will in general require additional axioms capturing their 
behaviour; Sect. 3 exemplifies this for constants | 4 representing nontermination. 

For the equational theory to capture call-by-need, we might expect compu- 
tation terms that are not of the form return V to never be duplicated, since they 
should not be evaluated more than once. There are two exceptions to this. Such 
terms can be duplicated in the axioms that duplicate value terms (such as the 8 
laws for sum types). In this case, the syntax ensures such terms are thunked. This 
is correct because we should allow these terms to be executed once in each sepa- 
rate execution of a computation (and separate executions arise from duplication 
of thunks). We are only concerned with duplication within a single computation. 
Computation terms can also be duplicated across multiple elements of a tuple 
A{i. Mi} of computation terms. This is also correct, because only one component 
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Thy fst(Mw,V2) = WU: A 
Ty snd (Vi, V2) = V2 $ Ag 
I Hy caseinl V of {inl x. W1, inry. W2} = Wijze > V] : B 


I Hy caseinr V of {inl x. W1, inry. W2} = Woly > V] : 


w 


IH force(thunkM) = M: C 


Tb i‘A{i. Mişier = Mi : C; 


i 


Tt V‘ti\r.M = Mix V]: 


IF return V tox. M 
I H return V need z. M 


[Fy case W of {inl y. V[x > inly], inr z. V[£ > inr z]} = 


= Mļ|z > return V] : C 
(a) 8 laws 
Thy Q = V : unit 
I Hy (fstV,sndV) = V : Ai x A2 
V|z = W] : B 
I Hy thunk(forceM) = M : UC 
Dh Mii‘ Myer = M: JlerC: 
r- àec'M=M:A>C 
Tt M torz. retunz = M : FrA 


! c 
M|r=> V]: C 
[ V 


(b) 7 laws 


IF Mneedz.xtoy.N = 
r į Mneeda.N = 


Tt Afi. M toz. Nijier = 
Tr Ay.Mtor.N = 

T+ Mi.M need z. Ni pier = 
IF dXy.M needa. N = 


TF (Mı tox. M2) toy. M3 


IF My tox. M2 needy. M3 = 
I+ (Mı need z. M2) toy. M3 = 
I + (Mı need z. M2) need y. M3 = 


M toy. N|z > returny] : C 
N g C 
M to x. A{i. Ni bier : II C 


icI =i 
Mtoxz.Ay.N : A> C 
M need z. A{i. Nijer : Iher C 


icI =i 


M need z. y.N : A> C 


M, tox. M2 to y. M3 : C 


Mz need y. Mı to z. Mz : 
Mı need z. Mə to y. M3 : C 


Mj need x. M2 need y. Mz : C 


IQ 


(c) Sequencing axioms 


Fig. 3. Equational theory of ECBPV 
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of a tuple can be used within a single computation (without thunking), so the 
effects still will not happen twice. (There is a similar consideration for functions, 
which can only be applied once.) The remainder of the axioms never duplicate 
need-bound terms that might have effects. 

The majority of the axioms of the equational theory are standard. Only the 
axioms involving need are new; these are highlighted. The first new sequencing 
axiom (in Fig. 3c) is the crucial one. It states that if a computation will next 
evaluate x, where x is a computation variable bound to M, then this is the same 
as evaluating M, and then using the result for subsequent uses of x. In particular, 
this axiom (together with the 7 law for Fr) implies that M need z.z = M. 

The second sequencing axiom does garbage collection [22]: if a computation 
bound by need is not used (because the variable does not appear), then the 
binding can be dropped. This equation implies, for example, that 


My, need x,. Mz need zs. --- Mn need xz,,. return() = return () 


The next four sequencing axioms (two from CBPV and two new) state that 
binding a computation with to or need commutes with the remaining forms 
of computation terms. These allow to and need to be moved to the outside of 
other constructs except thunks. The final four axioms (one from CBPV and three 
new) capture associativity and commutativity involving need and to; again these 
parallel the existing simple associativity axiom for to. 

Note that associativity between different evaluation orders is not necessarily 
valid. In particular, we do not have 


(Mı to x. Mz) need y. M3 = My, tox. (Mz need x. M3) 


(The first term might not evaluate M, the second always does.) This is usually 
the case when evaluation orders are mixed [26]. 

These final two groups allow computation terms to be placed in normal forms 
where bindings of computations are on the outside. (Compare this with the 
translation of source-language answers given in Sect.3.2.) Finally, the @ law 
for need (in Fig. 3a) parallels the usual 8 law for to: it gives the behaviour of 
computation terms that return values without having any effects. 

The above equational theory induces a notion of contextual equivalence ~ctx 
between ECBPV terms. Two terms are contextually equivalent when they have 
no observable differences in behaviour. When we discuss equivalences between 
evaluation orders in Sect. 3, “ctx is the notion of equivalence between terms that 
we consider. 

Contextual equivalence is defined as follows. The ground types G are the 
value types that do not contain thunks: 


G::= unit | Gy x Go | Gi + Go 


A value-term context C[—] is a computation term with a single hole (written 
—), which occurs in a position where a value term is expected. We write C[V] 
for the computation term that results from replacing the hole with V. Similarly, 
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computation-term contexts C[—] are computation terms with a single hole where a 
computation term is expected, and C[M] is the term in which the hole is replaced 
by M. Contextual equivalence says that the terms cannot be distinguished by 
closed computations that return ground types. (Recall that © is the empty typing 
context.) 


Definition 2 (Contextual equivalence). There are two judgement forms of 
contextual equivalence. 


1. Between value terms: I Fy V SaxW:Aifl Fy V: A, Ob, W: A, and for 
all ground types G and value-term contexts C such that o F C[V]: FrG and 
oF C[W]:ErG we have 


oF C[V] =C|W]:EFrG 


2. Between computation terms: [ M Sax N: Cifre M:C,CTEN:C, 
and for all ground types G and computation-term contexts C[—] such that 
oF C[M]: FrG and oH C[N]: Fr G we have 


o H CIM] =C[N]:FrG 


3 Call-by-Name and Call-by-Need 


Extended call-by-push-value can be used to prove equivalences between eval- 
uation orders. In this section we prove a classic example: if the only effect 
in the source language is nontermination, then call-by-name is equivalent to 
call-by-need. We do this in two stages. 

First, we show that call-by-name is equivalent to call-by-need within ECBPV 
(Sect. 3.1). Specifically, we show that 


M namez.N cx M need z. N 


(Recall that M name z. N is syntactic sugar for thunk M ‘ Ay. N[x + force y].) 

Second, an important corollary is that the meta-level reduction strategies are 
equivalent (Sect. 3.2). We show this by describing a lambda-calculus-based source 
language together with a call-by-name and a call-by-need operational semantics 
and giving sound (see Theorem 2) call-by-name and call-by-need translations into 
ECBPV. The former is based on the translation into the monadic metalanguage 
given by Moggi [25] (we expect Levy’s translation [17] to work equally well). 
The call-by-need translation is new here, and its existence shows that ECBPV 
does indeed subsume call-by-need. We then show that given any source-language 
expression, the two translations give contextually equivalent ECBPV terms. 

To model non-termination being our sole source-language effect, we use the 
ECBPV signature which contains a constant L4 : U (Fr A) for each value type 
A, representing a thunked diverging computation. It is likely that our proofs still 
work if we have general fixed-point operators as constants, but for simplicity we 
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do not consider this here. The constants La enable us to define a diverging 
computation Qc for each computation type C: 


Nyra = force LA MN eC; = Mi. Qo, hier Nac = àr. No 


We characterise nontermination by augmenting the equational theory of Sect. 2.3 
with the axiom 
I F Rratozs.M = Rce: C (Omega) 


for each context I’, value type A and computation type C. In other words, diverg- 
ing as part of a larger computation causes the entire computation to diverge. 
This is the only change to the equational theory we need to represent nontermi- 
nation. In particular, we do not add additional axioms involving need. 


3.1 The Equivalence at the Object (Internal) Level 


In this section, we show our primary result that 
M namez.N ctx M need x. N 


As is usually the case for proofs of contextual equivalence, we use logical relations 
to get a strong enough inductive hypothesis for the proof to go through. However, 
unlike the usual case, it does not suffice to relate closed terms. To see why, 
consider a closed term M of the form 


Nrra need z. N; to y. N2 


If we relate only closed terms, then we do not learn anything about Nj itself 
(since x may be free in it). We could attempt to proceed by considering the closed 
term Qp,,4 need x. N1. For example, if this returns a value V then x cannot have 
been evaluated and M should have the same behaviour as Rrra need x. No[y > 
V]. However, we get stuck when proving the last step. This is only a problem 
because Mp, 4 is a nonterminating computation: every terminating computation 
of returner type has the form returnV (up to =), and when these are bound 
using need we can eliminate the binding using the equation 


return V need z. M = M[ax |> return V] 


The solution is to relate terms that may have free computation variables (we 
do not need to consider free value variables). The free computation variables 
should be thought of as referring to nonterminating computations (because we 
can remove the bindings of variables that refer to terminating computations). 
We relate open terms using Kripke logical relations of varying arity, which were 
introduced by Jung and Tiuryn [12] to study lambda definability. 

We need a number of definitions first. A context I” weakens another context 
I’, written I” > I’, whenever T is a sublist of I”. For example, (T,x : Fr A) > 
I. We define Term), as the set of equivalence classes (up to the equational 
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theory =) of terms of value type A in context I’, and similarly define Terms, for 
computation types: 


Term, = {[V]=| TvV :A} Temi = {[M]=|T F M: D} 


Since weakening is admissible for both typing judgements, I” > I implies that 
Term), C Term), and Term}, C Term}, (note the contravariance). 

A computation context, ranged over by A, is a typing context that maps 
variables to computation types (i.e. has the form xz, : Fr Aj,...,2, : Fr An). 
Variables in computation contexts refer to nonterminating computations for the 
proof of contextual equivalence. A Kripke relation is a family of binary relations 
indexed by computation contexts that respects weakening of terms: 


Definition 3 (Kripke relation). A Kripke relation R over a value type A 
(respectively a computation type D) is a family of relations Rê C Term4 x 
Term4 (respectively RO C Termp x Terms ) indexed by computation contexts A 
such that whenever A’ >ò A we have R4 C R”. 


Note that we consider binary relations on equivalence classes of terms because 
we want to relate pairs of terms up to = (to prove contextual equivalence). 
The relations we define are partial equivalence relations (i.e. symmetric and 
transitive), though we do not explicitly use this fact. 

We need the Kripke relations we define over computation terms to be 
closed under sequencing with nonterminating computations. (For the rest of 
this section, we omit the square brackets around equivalence classes. ) 


Definition 4. A Kripke relation R over a computation type C is closed under 
sequencing if each of the following holds: 


1. If (x:FrA) E€ A and M,M'€ ‘Tem then (x toy. M, x toy. M’) € RA. 

2. The pair (Rc, Rc) is in RA. g 

3. For all (M, M') € RASA and N € {Nra} U {a | (x: Fr A) € A}, all four 
of the following pairs are in R^: 


(N need y. M, N need y. M’) (Mly = N], M'ly = N)) 
(Mly = N], N need y. M’) (N need y. M, M'[y + N]) 
For the first case of the definition, recall that the computation variables in A 
refer to nonterminating computations. Hence the behaviour of M and M’ are 


irrelevant (they are never evaluated), and we do not need to assume they are 
related. The second case implies (using axiom Omega) that 


(Rrra to y. M, Rrra toy. M’) € RO 


3 This is why it suffices to consider only computation contexts. If we had to relate 
M to M’ then we would need to consider relations between terms with free value 
variables. 
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A A A 
RA C Term, x Term, 


Ronit = {(0),0)} 

RÂ x Ag = {(V, V’) | (fst V, fst V’) € RA, A (snd V,sndV’) € R&,} 

Râ +a = {(inl V, inl V’) | (V, V’) € R4,}U {(inr V, inr V’) | (V, V’) € RG, } 
Rĝo = {(V, V’) | (force V, force V’) € RG} 


A A A 
Re C Terme x Terme 


Rrra ‘= the smallest closed-under-sequencing Kripke relation such that 
(V, v’) € RÂ > (return V, return V’) € Rea 
RA gc ={(M, M’) | Vi € I. (M, iM’) € RG} 
TlierE: =g 


RÂ c = {(M, M") |YA', V, V’. A'o A A (V, V) E€ R3 > (V'M,V"M’) € RÊ} 


Fig. 4. Definition of the logical relation 


mirroring the first case. The third case is the most important. It is similar to the 
first (it is there to ensure that the relation is closed under the primitives used to 
combine computations). However, since we are showing that need is contextually 
equivalent to substitution, we also want these to be related. We have to consider 
computation variables in the definition (as possible terms N) only because of 
our use of Kripke logical relations. For ordinary logical relations, there would be 
no free variables to consider. 

The key part of the proof of contextual equivalence is the definition of the 
Kripke logical relation, which is a family of relations indexed by value and com- 
putation types. It is defined in Fig. 4 by induction on the structure of the types. 
In the figure, we again omit square brackets around equivalence classes. 

The definition of the logical relation on ground types (unit, sum types and 
product types) is standard. Since the only way to use a thunk is to force it, 
the definition on thunk types just requires the two forced computations to be 
related. 

For returner types, we want any pair of computations that return related 
values to be related. We also want the relation to be closed under sequencing, 
in order to show the fundamental lemma (below) for to and need. We therefore 
define Rrra as the smallest such Kripke relation. For products of computation 
types the definition is similar to products of value types: we require that each of 
the projections are related. For function types, we require as usual that related 
arguments are sent to related results. For this to define a Kripke relation, we 
have to quantify over all computation contexts A’ that weaken A, because of 
the contravariance of the argument. 
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The relations we define are Kripke relations. Using the sequencing axioms of 
the equational theory, and the @ and ņ laws for computation types, we can show 
that Rc is closed under sequencing for each computation type C. These facts 
are important for the proof of the fundamental lemma. 

Substitutions are given by the following grammar: 


o:= o |o,xz=V |o, x= M 


We have a typing judgement Al ø : I for substitutions, meaning in the context 
A the terms in o have the types given in I’. This is defined as follows: 


AFo:T Aly V:A Ato:Tr At+M:FrA 
AkFo:o At (o, aV): (I,a@: A) AF (o, 2M): (Ia: Fr A) 


We write V[a] and Mo] for the applications of the substitution o to value terms 
V and computation terms M. These are defined by induction on the structure 
of the terms. The key property of the substitution typing judgement is that 
if AF o:T7,then +, V: A implies AF, Vlo]: Aad Ob M:C 
implies A + M[o] : C. The equational theory gives us an obvious pointwise 
equivalence relation = on well-typed substitutions. We define sets Subst# of 
equivalence classes of substitutions, and extend the logical relation by defining 
RA C Subst’? x Subst#: 


Subst@ := {[o]le | AF o: r} 


Re := {(0,0)} 
RÊ sa = {((0, x= V), (0', x= V')) | (0,0!) € RÊ A (V, V’) € RÂ} 
Rĝ awa = {((0, 2M), (0', x= M”)) | (0,0) € RẸ A (M, M’) € Rea} 


As usual, the logical relations satisfy a fundamental lemma. 
Lemma 1 (Fundamental) 


1. For all value terms I F, V : A, 

(oo) E R? => (V[o],V[o’]) € R3 
2. For all computation terms TF M: C, 

(o,o') E R? = (M[o],M[o']) € RÊ 


The proof is by induction on the structure of the terms. We use the fact that 
each Rc is closed under sequencing for the to and need cases. For the latter, we 
also use the fact that the relations respect weakening of terms. 

We also have the following two facts about the logical relation. The first 
roughly is that name is related to need by the logical relation, and is true 
because of the additional pairs that are related in the definition of closed-under- 
sequencing (Definition 4). 
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Lemma 2. For all computation terms [+ M: FrA andI,x:FrArt’Nn:C 
we have 


(o,o') E R? = ((N[z = M))[o], (M need x. N)[o’]) € RE 
The second fact is that related terms are contextually equivalent. 


Lemma 3 


1. For all value terms I F, V: A and +, V' : A, if (V[o], V"[o’]) € RÂ for 
all (o,0') € RÊ then 


DH, V Sex V'A 


2. For all computation terms T} M : C and I'= M' : C, if(M[o], M’[o']) € RÊ 
for all (o,0') € RÊ then 


TEM Sex M': C 
This gives us enough to achieve the goal of this section. 


Theorem 1. For all computation terms T- M :FrA andT,x:FrAFN:C, 
we have 


[FE M namez. N Sax M needz. N: C 


3.2 The Meta-level Equivalence 


In this section, we show that the equivalence between call-by-name and call- 
by-need also holds on the meta-level; this is a consequence of the object-level 
theorem, rather than something that is proved from scratch as it would be in a 
term rewriting system. 

To do this, we describe a simple lambda-calculus-based source language with 
divergence as the only side-effect and give it a call-by-name and a call-by-need 
operational semantics. We then describe two translations from the source lan- 
guage into ECBPV. The first is a call-by-name translation based on the embed- 
ding of call-by-name in Moggi’s [25] monadic metalanguage. The second is a 
call-by-need translation that uses our new constructs. The latter witnesses the 
fact that ECBPV does actually support call-by-need. Finally, we show that the 
two translations give contextually equivalent ECBPV terms. 

The syntax, type system and operational semantics of the source language are 
given in Fig. 5. Most of this is standard. We include only booleans and function 
types for simplicity. In expressions, we include a constant diverge, for each type 
A, representing a diverging computation. (As before, it should not be difficult to 
replace these with general fixed-point operators.) In typing contexts, we assume 
that all variables are distinct, and omit the required side-condition from the 
figure. There is a single set of variables z,y,...; we implicitly map these to 
ECBPYV value or computation variables as required. 
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Types A,B ::= bool | A > B 
Contexts Proll: A 
Expressions e ::= x | diverge, | true | false | if e1 then ez else e3 | Ax.e | e1 e2 
(a) Syntax 
— if(@:A)eT —— 
Tra:A T F diverge, : A 
T Fe: bool rFe:: A rFes:A 
I F true : bool T F false : bool T F if eı then ez else eg : A 
I,c:Ake:B rFe:4>B rFe:: A 
rHAàzr.e:A—>B Free:B 
(b) Typing 
if true then ez else e3 "*F° ez diverge, “3° diverge, 
if false then e2 else e3 "*¥° ez if diverge.) then ez else e3 “3° diverge, 
(Az.e)e’ “3° ejz e] diverge, p e "~~ diverge, 
name $: name d 
e ~ e a ~ e 
if e1 then e2 else e3 aS if el then e2 else e3 Eber aa el ez 


(c) Call-by-name operational semantics 


Evaluation contexts E[—] := — | if E[—] then e2 else e3 

| E[-] e2 | (Ax. Ela]) E'[-] | (Az. E[-]) e2 
Values v i= true | false | Ax. e 
Answers a ::= v | (Ax.a)e 


. nee 

if true then e2 else e3 ~> e2 . aaa i 
diverge, ~ diverge, 
` need 
if false then e2 else e3 ~~ e3 


(Az. E[a])v “S° (Ax. E[v]) v 


E{diverge ,| Ea diverge, 


need + 
nee vr E 
~~ 


(Aa. a) e1 e2 (Az. ae2) e1 


a need: any 
(Aw. Ele]) ((Ay.a)e) "SS" (Ay. (Ax. E[e]) a) e Pje] eee | 


(d) Call-by-need operational semantics 


Fig. 5. The source language 


The call-by-name operational semantics is straightforward; its small-step 
reductions are written e "° e, 
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The call-by-need operational semantics is based on Ariola and Felleisen [2]. 
The only differences between the source language and Ariola and Felleisen’s 
calculus are the addition of booleans, diverge,, and a type system. It is likely 
that we can translate other call-by-need calculi, such as those of Launchbury [16] 


and Maraist et al. [22]. Call-by-need small-step reductions are written e St e, 
The call-by-need semantics needs some auxiliary definitions. An evaluation 
contezt E|—] is a source-language expression with a single hole, picked from 
the grammar given in the figure. The hole in an evaluation context indicates 
where reduction is currently taking place: it says which part of the expression is 
currently needed. We write Ele] for the expression in which the hole is replaced 
with e. A (source-language) value is the result of a computation (the word value 
should not be confused with the value terms of extended call-by-push-value). 
An answer is a value in some environment, which maps variables to expressions. 
These can be thought of as closures. The environment is encoded in an answer 
using application and lambda abstraction: the answer (Ax. a) e means the answer 
a where the environment maps x to e. Encoding environments in this way makes 
the translation slightly simpler than if we had used a Launchbury-style [16] call- 
by-need language with explicit environments. In the latter case, the translation 
would need to encode the environments. Here they are already encoded inside 
expressions. Answers are terminal computations: they do not reduce. 

The first two reduction axioms (on the left) of the call-by-need semantics 
(Fig. 5d) are obvious. The third axiom is the most important: it states that if 
the subexpression currently being evaluated is a variable x, and the environment 
maps «x to a source-language value v, then that use of x can be replaced with 
v. Note that E[v] may contain other uses of x; the replacement only occurs 
when the value is actually needed. This axiom roughly corresponds to the first 
sequencing axiom of the equational theory of ECBPV (in Fig. 3c). The fourth and 
fifth axioms of the call-by-need operational semantics rearrange the environment 
into a standard form. Both use a syntactic restriction to answers so that each 


expression has at most one reduct (this restriction is not needed to ensure that 
need 


~> captures call-by-need). The rule on the right of the Fig. 5d states that the 
reduction relation is a congruence (a needed subexpression can be reduced). 

The two translations from the source language to ECBPV are given in Fig. 6. 
The translation of types (Fig.6a) is shared between call-by-name and call-by- 
need. The two translations differ only for contexts and expressions. Types A are 
translated into value types (A). The type bool becomes the two-element sum 
type unit + unit. The translation of a function type A — B is a thunked CBPV 
function type. The argument is a thunk of a computation that returns an (A), 
and the result is a computation that returns a (B). 

For call-by-name (Fig. 6b), contexts I’ are translated into contexts (T) 
that contain thunks of computations. We could also have used contexts con- 
taining computation variables (omitting the thunks), but choose to use thunks 
to keep the translation as close as possible to previous translations into call- 
by-push-value. A well-typed expression T H e: A is translated into a ECBPV 
computation term (e)”®™° that returns (A), in context (I7)"°™°. The translation 


name 
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(bool) := unit + unit (A > B) := U (U (Fr (A)) — Fr (B)) 


(a) Translation (A) of types 


Translation (I )°™™° of typing contexts 


(oam = o (La: Aprem’ = (r°, x : U (Fr (A)) 


Translation (I)?°"° H (e)"™° : Fr (A) of expressions 


(x name pneme 


:= force x (diverge , := force LA 


name 


news 


(true ‘= return inl () false := return inr () 


name = (ei )""* to x. force(case x of 
{inl g: thunk(e2)"""*, inr z. thunk(e3)?™™}) 
(Ax. e) "°° := return thunk(Az. (e)**"*) 


fei e2) 27" := (ler )"*""* to z. (thunk (e2)"*""*) ‘ (force z) 


(if e1 then e2 else e3 


(b) Call-by-name translation 


Translation (I°)"° of typing contexts 


(oprees =o (Iya : Ay = (rye "2 Fr (A) 
Translation (I°)?°°? + (e)"°°* : Fr (A) of expressions 
(x)? = x (diverge ,)"°°" := force La 
(true)"°°* := return inl () (false) "°°" := return inr () 
(if e1 then ez else e3/)"°°" := (e1)"°*" to x. force(case x of 


{inl z. thunk(e2)°", inr z. thunk(e3)"°°"}) 
(Az. e)"°°4 := return thunk(Az’. (force x’) need x. (e) °°") 


(e1 €2)"°°* := (e1)"°°" to z. (thunk (e2)"°°*) * (force z) 


(c) Call-by-need translation 


Fig. 6. Translation from the source language to ECBPV 


of variables just forces the relevant variable in the context. The diverging com- 
putations diverge, just use the diverging constants from our ECBPV signature. 
The translations of true and false are simple: they are computations that imme- 
diately return one of the elements of the sum type unit + unit. The translation 
of if eı then eg else e3 first evaluates (e1)”®™°, then uses the result to choose 
between (e2)??™° and (e3)"*™"°. Lambdas are translated into computations that 
just return a thunked computation. Finally, application first evaluates the com- 
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putation that returns a thunk of a function, and then forces this function, passing 
it a thunk of the argument. 

For call-by-need (Fig. 6c), contexts I" are translated into contexts (I°)"°°4, 
containing computations that return values. The computations in the context 
are all bound using need. An expression I F e : A is translated to a computation 
(e)"°e¢ that returns (A) in the context (I°)"°e¢. The typing is therefore similar 
to call-by-name. The key case is the translation of lambdas. These become com- 
putations that immediately return a thunk of a function. The function places 
the computation given as an argument onto the context using need, so that it is 
evaluated at most once, before executing the body. The remainder of the cases 
are similar to call-by-name. 

Under the call-by-need translation, the expression (Ax. e€) e2 is translated 
into a term that executes the computation (e,)"°°¢, and executes (e2)"°°¢ only 
when needed. This is the case because, by the 8 rules for thunks, functions, and 
returner types: 


(Az. e1) el = (e2)neet need z. deine 


As a consequence, translations of answers are particularly simple: they have the 
following form (up to =): 


M; need x,. Ma need z3. --- Mn need x,,. return V 


which intuitively means the value V in the environment mapping each x; to Mj. 


It is easy to see that both translations produce terms with the correct types. 
We prove that both translations are sound: if e ™S° e then (e)"2™° = (e’/)"2"°, 


and if e "SS" e' then (e)neet = (e’)2°e¢. To do this for call-by-need, we first look 
at translations of evaluation contexts. The following lemma says the translation 
captures the idea that the hole in an evaluation context corresponds to the term 
being evaluated. 


Lemma 4. Define, for each evaluation context E[—], the term Ey(E[—])"° by: 


Ey(—)ree4d := return y 
Ey (if E[—] then ez else e3)?°°4 := E(E[—])"°°4 to x. force(case x of 
{inl z. thunk(e2)"°°¢ 
, inr z. thunk(e3)"°°7}) 
Ey(E[—] eo) Peed = E (E[-])™°? to z. thunk(es)°°4 ‘ force z 
E (dx. Efa]) E'H)? = E, (E'H) need z. (E[x])"* 
Ey (Ax. E[-]) eg) "°°? := qe2)”°°® need x. €,(E[—])"°°* 


For each expression e we have: 


(Efe]pnee? = (e)nee? toy. E (E[-])"*4 
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This lemma omits the typing of expressions for presentational purposes. It is 
easy to add suitable constraints on typing. Soundness is now easy to show: 


Theorem 2 (Soundness). For any two well-typed source-language expressions 
Tre:AandIt e€: A: 


1. Ife™S. e then (e)22™° = (e prame, 
2. Ife "eS" e! then qe)" = ie’ )reet. 

Now that we have sound call-by-name and call-by-need translations, we can 
state the meta-level equivalence formally. Suppose we are given a possibly open 
source-language expression I’ e: B. Recall that the call-by-need translation 
uses a context containing computation variables (i.e. (I')"°°¢) and the call-by- 
name translation uses a context containing value variables, which map to thunks 
of computations. We have two ECBPV computation terms of type Fr (B) in 
context (I°)"°°¢: one is just (e)"°°*, the other is (e)"*™° with all of its variables 
substituted with thunked computations. The theorem then states that these are 
contextually equivalent. 


Theorem 3 (Equivalence between call-by-name and call-by-need). For 
all source-language expressions e satisfying x, : A1,...,Z,:AnFe:B 


qe): [z Es thunk z4, ove Tn => thunk z,„] ctx (epree 


Proof. The proof of this theorem is by induction on the typing derivation of e. 
The interesting case is lambda abstraction, where we use the internal equivalence 
between call-by-name and call-by-need (Theorem 1). 


4 An Effect System for Extended Call-by-Push- Value 


The equivalence between call-by-name and call-by-need in the previous section 
is predicated on the only effect in the language being nontermination. However, 
suppose the primitives of language have various effects (which means that in 
general the equivalence fails) but a given subprogram may be statically shown 
to have at most nontermination effects. In this case, we should be allowed to 
exploit the equivalence on the subprogram, interchanging call-by-need and call- 
by-name locally, even if the rest of the program uses other effects. In this section, 
we describe an effect system [20] for ECBPV, which statically estimates the side- 
effects of expressions, allowing us to exploit equivalences which hold only within 
subprograms. Effect systems can also be used for other purposes, such as proving 
the correctness of effect-dependent program transformations [7,29]. The ECBPV 
effect system also allows these. 

Call-by-need makes statically estimating effects difficult. Computation vari- 
ables bound using need might have effects on their first use, but on subsequent 
uses do not. Hence to precisely determine the effects of a term, we must track 
which variables have been used. McDermott and Mycroft [23] show how to 
achieve this for a call-by-need effect system; their technique can be adapted 
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to ECBPV. Here we take a simpler approach. By slightly restricting the effect 
algebras we consider, we remove the need to track variable usage information, 
while still ensuring the effect information is not an underestimate (an underesti- 
mate would enable incorrect transformations). This can reduce the precision of 
the effect information obtained, but for our use case (determining equivalences 
between evaluation orders) this is not an issue, since we primarily care about 
which effects are used (rather than e.g. how many times they are used). 


4.1 Effects 


The effect system is parameterized by an effect algebra, which specifies the infor- 
mation that is tracked. Different effect algebras can be chosen for different appli- 
cations. There are various forms of effect algebra. We follow Katsumata [15] and 
use preordered monoids, which are the most general. 


Definition 5 (Preordered monoid). A preordered monoid (F,<,-,1) con- 
sists of a monoid (F,-,1) and a preorder < on F, such that the binary operation 
- is monotone in each argument separately. 


Since we do not track variable usage information, we might misestimate the 
effect of a call-by-need computation variable evaluated for a second time (whose 
true effect is 1). To ensure this misestimate is an overestimate, we assume that 
the effect algebra is pointed (which is the case for most applications). 


Definition 6 (Pointed preordered monoid). A preordered monoid (F,< 
,-, 1) ts pointed if for all f € F we have 1 < f. 


The elements f of the set F are called effects. Each effect abstractly represents 
some potential side-effecting behaviours. The order < provides approximation of 
effects. When f < f’ this means behaviours represented by f are included in 
those represented by f’. The binary operation - represents sequencing of effects, 
and 1 is the effect of a side-effect-free expression. 

Traditional (Gifford-style) effect systems have some set X of operations (for 
example, X := {read, write}), and use the preordered monoid (P’, C,U, 9). In 
these cases, an effect f is just a set of operations. If a computation has effect f 
then f contains all of the operations the computation may perform. They can 
therefore be used to enforce that computations do not use particular operations. 
Another example is the preordered monoid (N*,<,+,1), which can be used to 
count the number of possible results a nondeterministic computation can return 
(or to count the number of times an operation is used). 

In our example, where we wish to establish whether the effects of an expres- 
sion are restricted to nontermination for our main example, we use the two- 
element preorder {diveff < T} with join for sequencing and diveff as the unit 1. 
The effect diveff means side-effects restricted to (at most) nontermination, and T 
means unrestricted side-effects. Thus we would enable the equivalence between 
call-by-name and call-by-need when the effect is diveff, and not when it is T. All 
of these examples are pointed. Others can be found in the literature. 
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A<: B 
Ay <iy Bi Ag <:y Bo Ay <sy Bı Ag <:y Bo 
unit <:y unit A, X Ao <:y Bi x Bo Ai + Ao <1 Bi + Bo 
C<:D 
UC <y UD 
C<:D 
(C; <: D;)ier A<x B C<:D A<: B TPE 
1 = 
Hier Gi <: Mier P: (B => C) <: (A > D) (P)A <: (F')B 


Fig. 7. Subtyping in the ECBPV effect system 


4.2 Effect System and Signature 


The effect system includes effects within types. Specifically, each computation of 
returner type will have some side-effects when it is run, and hence each returner 
type Fr A is annotated with an element f of F. We write the annotated type 
as (f)A. Formally we replace the grammar of ECBPV computation types (and 
similarly, the grammar of typing contexts) with 


C, D:= [Tier Gi | A> C | (PA 
Piu=o |T,x: A| ETNA 


(The highlighted parts indicate the differences.) The grammar used for value 
types is unchanged, except that it uses the new syntax of computation types. 

The definition of ECBPV signature is similarly extended to contain the effect 
algebra as well as the set of constants: 


Definition 7 (Signature). A signature (F, K) consists of a pointed preordered 
monoid (F,<,-,1) of effects and, for each value type A, a set Ka of constants 
of type A, including () E€ Kunit- 


We assume a fixed effect system signature for the remainder of this section. 
Since types contain effects, which have a notion of subeffecting, there is a 
natural notion of subtyping. We define (in Fig. 7) two subtyping relations: A <:y 
B for value types and C <: D for computation types. 
We treat the type constructor (f) as an operation on computation types by 
defining computation types (f)C. 


(Ieri) Mer (NC: AAAA (AUA =F 


This is an action of the preordered monoid on computation types. Its purpose 
is to give the typing rule for sequencing of computations. The sequencing of a 
computation with effect f with a computation of type C has type (f)C. 
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TrV:A 


Tha:(fjAa I F return V : (1)A 
r-M:(f)}A Tx: AFN:C TKM:(f)A T,x: (fyAFN:C 
CE Mtog.N:(f)c Tt Mneedz.N:C 

ThHV:A ThyM:C 
———— ifA<,B ———_ ifC<:D 
ThyV:B Thy N:D 


Fig. 8. Effect system modifications to ECBPV 


The typing judgements have exactly the same form as before (except for 
the new syntax of types). The majority of the typing rules, including all of the 
rules for value terms, are also unchanged. The only rules we change are those 
for computation variables, return, to and need, which are replaced with the first 
four rules in Fig. 8. We also add two subtyping rules, one for values and one for 
computations. These are the last two rules of Fig. 8. 

The equational theory does not need to be changed to use it with the new 
effect system (except that the types appearing in each axiom now include effect 
information). For each axiom of the equational theory, the two terms still have 
the same type in the effect system. In particular, for the axiom 


M need z.gztoy.N = M toy. N|z —> return y] 


f rH- M:(f)AandT,a: (f)A,y: AFN: C then the left-hand side has type 
(f)C. For the right-hand-side, we have T, y : AF N[a |> return y] : C, because of 
the assumption that the preordered monoid is pointed (which implies return y can 
have any effect by subtyping, not just the unit effect 1). Hence the right-hand- 
side also has type (f)C. This axiom is the reason for our pointedness requirement. 
In particular, if we drop need from the language, the pointedness requirement is 
not required. Thus the rules we give also describe a fully general effect system 
for CBPV in which the effect algebra can be any preordered monoid. 


4.3 Exploiting Effect-Dependent Equivalences 


Our primary goal in adding an effect system to ECBPV is to exploit (local, effect- 
justified) equivalences between evaluation orders even without a whole-language 
restriction on effects. We sketch how to do this for our example. 

When proving the equivalence between call-by-name and call-by-need in 
Sect.3 we assumed that the only constants in the language were () and L4 : 
U (Fr A). To relax this restriction, we use the effect algebra with preorder 
{diveff < T} described above, and change the type of L4 from U (Fr A) to 
U ((diveff) A). We can include other effectful constants, and give them the effect 
T (e.g. write: U (V — (T)unit)). 
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The statement of the internal (object-level) equivalence becomes: 


if rH M : (diveff)A and I, x : (diveff) At N : C then 
rH M namez. N Sx M needa.N:C 


The premise restricts the effect of M to diveff so that nontermination is its 
only possible side-effect. To prove this equivalence, we need a logical relation for 
the effect system, which means we have to define a Kripke relation Ry) 4 for 
each effect f. For Rydiverrya we use the same definition as before (the definition 
of Rrra). The definition of Ry+)4 depends on the specific other effects included. 

To state and prove a meta-level equivalence for a source language that 
includes other side-effects, we need to define an effect system for the source 
language. This would use the same effect algebra as the ECBPV effect system, 
and be such that the translation of source language expressions preserves effects. 
To do this for the source language of Sect. 3, we replace the syntax of function 


types with (f)A £ B, where f is the effect of the argument (required due to 
lazy evaluation), and f’ is the latent effect of the function (the effect it has after 
application). The translation is then 


((f)4 & B) = U (U ((A)UAD) > (PBD) 


Just as for the object-level equivalence, the statement of the meta-level equiva- 
lence similarly requires the source-language expression to have the effect diveff. 
We omit the details here. 


5 Related Work 


Metalanguages for Evaluation Order. Call-by-push-value is similar to Moggi’s 
monadic metalanguage [25], except for the distinction between computations 
and values. Both support several evaluation orders, but neither supports call- 
by-need. Polarized type theories [34] also take the approach of stratifying types 
into several kinds to capture multiple evaluation orders. Downen and Ariola [10] 
recently described how to capture call-by-need using polarity. They take a dif- 
ferent approach to ours, by splitting up terms according to their evaluation 
order, rather than whether they might have effects. This means they have three 
kinds of type, resulting in a more complex language than ours. They also do 
not apply their language to reasoning about the differences between evaluation 
orders, which was the primary motivation for ECBPV. It is not clear whether 
their language can also be used for this purpose. 

Multiple evaluation orders can also be captured in a Moggi-style language 
by using joinads instead of monads [28]. It is possible that there is some joinad 
structure implicit in extended call-by-push-value. 
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Reasoning About Call-by-Need. The majority of work on reasoning about call- 
by-need source languages has concentrated on operational semantics based on 
environments [16], graphs [30,32], and answers [2,3,9,22]. However, these do not 
compare call-by-need with other evaluation orders. The only type-based analysis 
of a lazy source language we know of apart from McDermott and Mycroft’s effect 
system [23] is [31,33]. 


Logical Relations. Kripke logical relations have previously been applied to the 
problems of lambda definability [12] and normalization [1,11]. Previous proofs 
of contextual equivalence relate only closed terms. We were forced to relate open 
terms because of the need construct. 

Reasoning about effects using logical relations often runs into a difficulty 
in ensuring the relations are closed under sequencing of computations. We are 
able to work around this due to our specific choice of effects. It is possible that 
considering other effects would require a technique such as Lindley and Stark’s 
leapfrog method [18,19]. 


Effect Systems. Effect systems have a long history, starting with Gifford-style 
effect systems [20]. We use preordered monoids as effect algebras following Kat- 
sumata [15]. Almost all of the previous work on effect systems has concentrated 
on call-by-value only. Kammar and Plotkin [13,14] describe a Gifford-style call- 
by-push-value effect system, though their formulation does not generalise to 
other effect algebras. Our effect system is the first general effect system for a 
CBPV-like language. The only previous work on call-by-need effects is [23]. 

There has also been much work on reasoning about program transforma- 
tions using effect systems, e.g. [4-8,29]. We expect it to be possible to recast 
much of this in terms of extended call-by-push-value, and therefore apply these 
transformations for various evaluation orders. 


6 Conclusions and Future Work 


We have described extended call-by-push-value, a calculus that can be used for 
reasoning about several evaluation orders. In particular, ECBPV supports call- 
by-need via the addition of the construct M need x. N. This allows us to prove 
that call-by-name and call-by-need reduction are equivalent if nontermination is 
the only effect in the source language, both inside the language itself, and on the 
meta-level. We proved the latter by giving two translations of a source language 
into ECBPV: one that captures call-by-name reduction, and one that captures 
call-by-need reduction. We also defined an effect system for ECBPV. The effect 
system statically bounds the side-effects of terms, allowing equivalences between 
evaluation orders to be used without restricting the entire language to particular 
effects. We close with a description of possible future work. 


Other Equivalences Between Evaluation Orders. We have proved one example 
of an equivalence between evaluation orders using ECBPV, but there are others 
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that we might also expect to hold. For example, we would expect call-by-need 
and call-by-value to be equivalent if the effects are restricted to nondeterminism, 
allocating state, and reading from state (but not writing). It should be possible to 
use ECBPV to prove these by defining suitable logical relations. More generally, 
it might be possible to characterize when particular equivalences hold in terms 
of the algebraic properties of the effects we restrict to. 


Denotational Semantics. Using logical relations to prove contextual equivalence 
between terms directly is difficult. Adequate denotational semantics would allow 
us to reduce proofs of contextual equivalence to proofs of equalities in the model. 
Composing the denotational semantics with the call-by-need translation would 
also result in a call-by-need denotational semantics for the source language. Some 
potential approaches to describing the denotational semantics of ECBPV are 
Maraist et al.’s [21] translation into an affine calculus, combined with a semantics 
of linear logic [24], and also continuation-passing-style translations [27]. None of 
these consider side-effects however. 


Acknowledgements. We gratefully acknowledge the support of an EPSRC stu- 
dentship, and thank the anonymous reviewers for helpful comments. 


References 


1. Altenkirch, T., Hofmann, M., Streicher, T.: Categorical reconstruction of a reduc- 
tion free normalization proof. In: Pitt, D., Rydeheard, D.E., Johnstone, P. (eds.) 
CTCS 1995. LNCS, vol. 953, pp. 182-199. Springer, Heidelberg (1995). https:// 
doi.org/10.1007/3-540-60164-3_27 

2. Ariola, Z.M., Felleisen, M.: The call-by-need lambda calculus. J. Funct. Program. 
7(3), 265-301 (1997) 

3. Ariola, Z.M., Maraist, J., Odersky, M., Felleisen, M., Wadler, P.: A call-by-need 
lambda calculus. In: Proceedings of the 22nd ACM SIGPLAN-SIGACT Sympo- 
sium on Principles of Programming Languages, pp. 233-246. ACM (1995). https:// 
doi.org/10.1145/199448.199507 

4. Benton, N., Hofmann, M., Nigam, V.: Effect-dependent transformations for concur- 
rent programs. In: Proceedings of the 18th International Symposium on Principles 
and Practice of Declarative Programming, pp. 188-201. ACM (2016). https://doi. 
org/10.1145/2967973.2968602 

5. Benton, N., Kennedy, A.: Monads, effects and transformations. Electron. Notes 
Theor. Comput. Sci. 26, 3-20 (1999). https://doi.org/10.1016/51571-0661(05) 
80280-4 

6. Benton, N., Kennedy, A., Hofmann, M., Nigam, V.: Counting successes: effects 
and transformations for non-deterministic programs. In: Lindley, S., McBride, C., 
Trinder, P., Sannella, D. (eds.) A List of Successes That Can Change the World. 
LNCS, vol. 9600, pp. 56-72. Springer, Cham (2016). https: //doi.org/10.1007/978- 
3-319-30936-1_3 

7. Benton, N., Kennedy, A., Russell, G.: Compiling standard ML to Java bytecodes. 
In: Proceedings of the Third ACM SIGPLAN International Conference on Func- 
tional Programming, pp. 129-140. ACM (1998). https: //doi.org/10.1145/289423. 
289435 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


Extended Call-by-Push-Value 261 


Birkedal, L., Sieczkowski, F., Thamsborg, J.: A concurrent logical relation. In: 
Cégielski, P., Durand, A. (eds.) 21st EACSL Annual Conference on Computer Sci- 
ence Logic, CSL 2012. Leibniz International Proceedings in Informatics (LIPIcs), 
vol. 16, pp. 107-121. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl 
(2012). https://doi.org/10.4230/LIPIcs.CSL.2012.107 

Chang, S., Felleisen, M.: The call-by-need lambda calculus, revisited. In: Seidl, 
H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 128-147. Springer, Heidelberg (2012). 
https: //doi.org/10.1007/978-3-642-28869-2_7 

Downen, P., Ariola, Z.M.: Beyond polarity: towards a multi-discipline intermediate 
language with sharing. In: 27th EACSL Annual Conference on Computer Science 
Logic, CSL 2018, pp. 21:1-21:23 (2018). https: //doi.org/10.4230/LIPIcs.CSL.2018. 
21 

Fiore, M.: Semantic analysis of normalisation by evaluation for typed lambda calcu- 
lus. In: Proceedings of the 4th ACM SIGPLAN International Conference on Prin- 
ciples and Practice of Declarative Programming, pp. 26-37. ACM (2002). https:// 
doi.org/10.1145/571157.571161 

Jung, A., Tiuryn, J.: A new characterization of lambda definability. In: Bezem, M., 
Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664, pp. 245-257. Springer, Heidelberg 
(1993). https: //doi.org/10.1007/BFb0037110 

Kammar, O.: Algebraic theory of type-and-effect systems. Ph.D. thesis, University 
of Edinburgh, UK (2014) 

Kammar, O., Plotkin, G.D.: Algebraic foundations for effect-dependent optimisa- 
tions. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium 
on Principles of Programming Languages, pp. 349-360. ACM (2012). https://doi. 
org/10.1145/2103656.2103698 

Katsumata, 5.: Parametric effect monads and semantics of effect systems. In: 
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of 
Programming Languages, pp. 633-645. ACM (2014). https://doi.org/10.1145/ 
2535838.2535846 

Launchbury, J.: A natural semantics for lazy evaluation. In: Proceedings of the 20th 
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 
pp. 144-154. ACM (1993). https: //doi.org/10.1145/158511.158618 

Levy, P.B.: Call-by-push-value: a subsuming paradigm. In: Girard, J.-Y. (ed.) 
TLCA 1999. LNCS, vol. 1581, pp. 228-243. Springer, Heidelberg (1999). https:// 
doi.org/10.1007/3-540-48959-2_17 

Lindley, S.: Normalisation by evaluation in the compilation of typed functional 
programming languages. Ph.D. thesis, University of Edinburgh, UK (2005) 
Lindley, S., Stark, I.: Reducibility and TT-lifting for computation types. In: 
Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 262-277. Springer, Heidelberg 
(2005). https: //doi.org/10.1007/11417170_20 

Lucassen, J.M., Gifford, D.K.: Polymorphic effect systems. In: Proceedings of the 
15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan- 
guages, pp. 47-57. ACM (1988). https: //doi.org/10.1145/73560.73564 

Maraist, J., Odersky, M., Turner, D.N., Wadler, P.: Call-by-name, call-by-value, 
call-by-need, and the linear lambda calculus. In: Proceedings of the Eleventh 
Annual Mathematical Foundations of Programming Semantics Conference, pp. 
370-392 (1995). https: //doi.org/10.1016/S0304-3975(98)00358-2 

Maraist, J., Odersky, M., Wadler, P.: The call-by-need lambda calculus. J. Funct. 
Program. 8(3), 275-317 (1998). https://doi.org/10.1017/S0956796898003037 
McDermott, D., Mycroft, A.: Call-by-need effects via coeffects. Open Comput. Sci. 
8, 93-108 (2018). https://doi.org/10.1515/comp-2018-0009 


262 D. McDermott and A. Mycroft 


24. Melliés, P.A.: Categorical semantics of linear logic. In: Interactive Models of 
Computation and Program Behaviour, Panoramas et Synthéses 27, Société 
Mathématique de France (2009) 

25. Moggi, E.: Notions of computation and monads. Inf. Comput. 93(1), 55-92 (1991). 
https: //doi.org/10.1016/0890-5401(91)90052-4 

26. Munch-Maccagnoni, G.: Models of a non-associative composition. In: Muscholl, A. 
(ed.) FoSSaCS 2014. LNCS, vol. 8412, pp. 396-410. Springer, Heidelberg (2014). 
https: //doi.org/10.1007/978-3-642-54830-7_26 

27. Okasaki, C., Lee, P., Tarditi, D.: Call-by-need and continuation-passing style. LISP 
Symbolic Comput. 7(1), 57-81 (1994). https: //doi.org/10.1007/BF01019945 

28. Petricek, T., Syme, D.: Joinads: a retargetable control-flow construct for reactive, 
parallel and concurrent programming. In: Rocha, R., Launchbury, J. (eds.) PADL 
2011. LNCS, vol. 6539, pp. 205-219. Springer, Heidelberg (2011). https://doi.org/ 
10.1007 /978-3-642-18378-2_17 

29. Tolmach, A.: Optimizing ML using a hierarchy of monadic types. In: Leroy, X., 
Ohori, A. (eds.) TIC 1998. LNCS, vol. 1473, pp. 97-115. Springer, Heidelberg 
(1998). https: //doi.org/10.1007/BFb0055514 

30. Turner, D.A.: A new implementation technique for applicative languages. Softw. 
Pract. Experience 9(1), 31-49 (1979). https: //doi.org/10.1002/spe.4380090105 

31. Turner, D.N., Wadler, P., Mossin, C.: Once upon a type. In: Proceedings of the Sev- 
enth International Conference on Functional Programming Languages and Com- 
puter Architecture, pp. 1-11. ACM (1995). https://doi.org/10.1145/224164.224168 

32. Wadsworth, C.: Semantics and Pragmatics of the Lambda-Calculus. University of 
Oxford (1971) 

33. Wansbrough, K., Peyton Jones, S.: Once upon a polymorphic type. In: Proceedings 
of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming 
Languages, pp. 15-28. ACM (1999). https://doi.org/10.1145/292540.292545 

34. Zeilberger, N.: The logical basis of evaluation order and pattern-matching. Ph.D. 
thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2009) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


S 


Check for 
updates 


Effectful Normal Form Bisimulation 


Ugo Dal Lago!?() and Francesco Gavazzo!:?() 

1 University of Bologna, Bologna, Italy 
2 Inria Sophia Antipolis, Sophia Antipolis Cedex, France 
ugo.dallago@unibo.it, francesco.gavazzo@gmail.com 


Abstract. Normal form bisimulation, also known as open bisimulation, 
is a coinductive technique for higher-order program equivalence in which 
programs are compared by looking at their essentially infinitary tree-like 
normal forms, i.e. at their Bohm or Lévy-Longo trees. The technique 
has been shown to be useful not only when proving metatheorems about 
A-calculi and their semantics, but also when looking at concrete exam- 
ples of terms. In this paper, we show that there is a way to generalise 
normal form bisimulation to calculi with algebraic effects, à la Plotkin 
and Power. We show that some mild conditions on monads and rela- 
tors, which have already been shown to guarantee effectful applicative 
bisimilarity to be a congruence relation, are enough to prove that the 
obtained notion of bisimilarity, which we call effectful normal form bisim- 
ilarity, is a congruence relation, and thus sound for contextual equiv- 
alence. Additionally, contrary to applicative bisimilarity, normal form 
bisimilarity allows for enhancements of the bisimulation proof method, 
hence proving a powerful reasoning principle for effectful programming 
languages. 


1 Introduction 


The study of program equivalence has always been one of the central tasks of 
programming language theory: giving satisfactory definitions and methodologies 
for it can be fruitful in contexts like program verification and compiler optimi- 
sation design, but also helps in understanding the nature of the programming 
language at hand. This is particularly true when dealing with higher-order lan- 
guages, in which giving satisfactory notions of program equivalence is well-known 
to be hard. Indeed, the problem has been approached in many different ways. 
One can define program equivalence through denotational semantics, thus rely- 
ing on a model. One could also proceed following the route traced by Morris [51], 
and define programs to be contextually equivalent when they behave the same 
in every context, this way taking program equivalence as the largest adequate 
congruence. 

Both these approaches have their drawbacks, the first one relying on the 
existence of a (not too coarse) denotational model, the latter quantifying over 
all contexts, and thus making concrete proofs of equivalence hard. Among the 
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many alternative techniques the research community has been proposing along 
the years, one can cite logical relations and applicative bisimilarity [1,4,8], both 
based on the idea that equivalent higher-order terms should behave the same 
when fed with any (pair of related) inputs. This way, terms are compared mim- 
icking any possible action a discriminating context could possibly perform on 
the tested terms. In other words, the universal quantification on all possible 
contexts, although not explicitly present, is anyway implicitly captured by the 
bisimulation or logical game. 

Starting from the pioneering work by Bohm, another way of defining program 
equivalence has been proved extremely useful not only when giving metatheo- 
rems about A-calculi and programming languages, but also when proving con- 
crete programs to be (contextually) equivalent. What we are referring to, of 
course, is the notion of a Böhm tree of a A-term e (see [5] for a formal defini- 
tion), which is a possibly infinite tree representing the head normal h form of 
e, if e has one, but also analyzing the arguments to the head variable of h in 
a coinductive way. The celebrated Bohm Theorem, also known as Separation 
Theorem [11], stipulates that two terms are contextually equivalent if and only 
if their respective (appropriately 7-equated) Böhm trees are the same. 

The notion of equivalence induced by Bohm trees can be characterised with- 
out any reference to trees, by means of a suitable bisimilarity relation [37,65]. 
Additionally, Bohm trees can also be defined when A-terms are not evaluated 
to their head normal form, like in the classical theory of A-calculus, but to their 
weak head normal form (like in the call-by-name [37,65]), or to their eager nor- 
mal form (like in the call-by-value A-calculus [38]). In both cases, the notion of 
program equivalence one obtains by comparing the syntactic structure of trees, 
admits an elegant coinductive characterisation as a suitable bisimilarity relation. 
The family of bisimilarity relations thus obtained goes under the generic name 
of normal form bisimilarity. 

Real world functional programming languages, however, come equipped not 
only with higher-order functions, but also with computational effects, turning 
them into impure languages in which functions cannot be seen merely as turn- 
ing an input to an output. This requires switching to a new model, which can- 
not be the usual, pure, A-calculus. Indeed, program equivalence in effectful à- 
calculi [49,56] have been studied by way of denotational semantics [18, 20,31], 
logical relations [10,14], applicative bisimilarity [13,16,36], and normal form 
bisimilarity [20,41]. While the denotational semantics, logical relation seman- 
tics, and applicative bisimilarity of effectful calculi have been studied in the 
abstract [15,25,30], the same cannot be said about normal form bisimilarity. 
Particularly relevant for our purposes is [15], where a notion of applicative bisim- 
ilarity for generic algebraic effects, called effectful applicative bisimilarity, based 
on the (standard) notion of a monad, and on the (less standard) notion of a 
relator [71] or lax extension [6,26], is introduced. 

Intuitively, a relator is an abstraction axiomatising the structural prop- 
erties of relation lifting operations. This way, relators allow for an abstract 
description of the possible ways a relation between programs can be lifted to a 
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relation between (the results of) effectful computations, the latter being 
described throughout monads and algebraic operations. Several concrete notions 
of program equivalence, such as pure, nondeterministic and probabilistic applica- 
tive bisimilarity [1,16,36,52] can be analysed using relators. Additionally, besides 
their prime role in the study of effectful applicative bisimilarity, relators have 
also been used to study logic-based equivalences [67] and applicative distances 
[23] for languages with generic algebraic effects. 

The main contribution of [15] consists in devising a set of axioms on monads 
and relators (summarised in the notions of a X-continuous monad and a X- 
continuous relator) which are both satisfied by many concrete examples, and 
that abstractly guarantee that the associated notion of applicative bisimilarity 
is a congruence. 

In this paper, we show that an abstract notion of normal form (bi)simulation 
can indeed be given for calculi with algebraic effects, thus defining a theory anal- 
ogous to [15]. Remarkably, we show that the defining axioms of 7-continuous 
monads and 2/-continuous relators guarantee the resulting notion of normal form 
(bi)similarity to be a (pre)congruence relation, thus enabling compositional rea- 
soning about program equivalence and refinement. Given that these axioms have 
already been shown to hold in many relevant examples of calculi with effects, 
our work shows that there is a way to “cook up” notions of effectful normal 
form bisimulation without having to reprove congruence of the obtained notion 
of program equivalence: this comes somehow for free. Moreover, this holds both 
when call-by-name and call-by-value program evaluation is considered, although 
in this paper we will mostly focus on the latter, since the call-by-value reduction 
strategy is more natural in presence of computational effects!. 

Compared to (effectful) applicative bisimilarity, as well as to other standard 
operational techniques—such as contextual and CIU equivalence [47,51], or log- 
ical relations [55,61]—(effectful) normal form bisimilarity has the major advan- 
tage of being an intensional program equivalence, equating programs according 
to the syntactic structure of their (possibly infinitary) normal forms. As a conse- 
quence, in order to deem two programs as normal form bisimilar, it is sufficient 
to test them in isolation, i.e. independently of their interaction with the envi- 
ronment. This way, we obtain easier proofs of equivalence between (effectful) 
programs. Additionally, normal form bisimilarity allows for enhancements of the 
bisimulation proof method [60], hence qualifying as a powerful and effective tool 
for program equivalence. 

Intensionality represents a major difference between normal form bisimilar- 
ity and applicative bisimilarity, where the environment interacts with the tested 
programs by passing them arbitrary input arguments (thus making applicative 
bisimilarity an extensional notion of program equivalence). Testing programs in 
isolation has, however, its drawbacks. In fact, although we prove effectful normal 
form bisimilarity to be a sound proof technique for (effectful) applicative bisim- 


1 Besides, as we will discuss in Sect. 6.4, the formal analysis of call-by-name normal 
form bisimilarity strictly follows the corresponding (more challenging) analysis of 
call-by-value normal form bisimilarity. 
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ilarity (and thus for contextual equivalence), full abstraction fails, as already 
observed in the case of the pure A-calculus [3,38] (nonetheless, it is worth men- 
tioning that full abstraction results are known to hold for calculi with a rich 
expressive power [65,68]). 

In light of these observations, we devote some energy to studying some con- 
crete examples which highlight the weaknesses of applicative bisimilarity, on the 
one hand, and the strengths of normal form bisimilarity, on the other hand. 

This paper is structured as follows. In Sect. 2 we informally discuss examples 
of (pairs of) programs which are operational equivalent, but whose equivalence 
cannot be readily established using standard operational methods. Through- 
out this paper, we will show how effectful normal form bisimilarity allows for 
handy proofs of such equivalences. Section 3 is dedicated to mathematical pre- 
liminaries, with a special focus on (selected) examples of monads and algebraic 
operations. In Sect. 4 we define our vehicle calculus Ay, an untyped -calculus 
enriched with algebraic operations, to which we give call-by-value monadic oper- 
ational semantics. Section 5 introduces relators and their main properties. In 
Sect.6 we introduce effectful eager normal form (bi)similarity, the call-by-value 
instantiation of effectful normal form (bi)similarity, and its main metatheoreti- 
cal properties. In particular, we prove effectful eager normal form (bi)similarity 
to be a (pre)congruence relation (Theorem 2) included in effectful applicative 
(bi)similarity (Proposition 5). Additionally, we prove soundness of eager normal 
bisimulation up-to context (Theorem 3), a powerful enhancement of the bisimu- 
lation proof method that allows for handy proof of program equivalence. Finally, 
in Sect. 6.4 we briefly discuss how to modify our theory to deal with call-by-name 
calculi. 


2 From Applicative to Normal Form Bisimilarity 


In this section, some examples of (pairs of) programs which can be shown equiv- 
alent by effectful normal form bisimilarity will be provided, giving evidence on 
the flexibility and strength of the proposed technique. We will focus on examples 
drawn from fixed point theory, simply because these, being infinitary in nature, 
are quite hard to be dealt with “finitary” techniques like contextual equivalence 
or applicative bisimilarity. 


Example 1. Our first example comes from the ordinary theory of pure, untyped 
A-calculus. Let us consider Curry’s and Turing’s call-by-value fixed point com- 
binators Y and Z: 


Y4ry.AA, Z£00, AA dsxy(dz.22z), OF Ax.dy.y(Az.rxyz). 


It is well known that Y and Z are contextually equivalent, although proving such 
an equivalence from first principles is doomed to be hard. For that reason, one 
usually looks at proof techniques for contextual equivalence. Here we consider 
applicative bisimilarity [1]. As in the pure \-calculus applicative bisimilarity 
coincides with the intersection of applicative similarity and its converse, for the 
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sake of the argument we discuss which difficulties one faces when trying to prove 
Z to be applicatively similar to Y. 

Let us try to construct an applicative simulation R relating Y and Z. Clearly 
we need to have (Y,Z) € R. Since Y evaluates to Ay.AA, and Z evaluates 
to Ay.y(Az.O@yz), in order for R to be an applicative simulation, we need to 
show that for any value v, (Alv/y|Alu/y], v(Az.OO@vz)) E€ R. Since the result 
of the evaluation of A[v/y]A[u/y] is the same of u(Az.A[v/y]Alu/y]z), we have 
reached a point in which we are stuck: in order to ensure (Y, Z) € R, we need to 
show that (v(Az.A[v/y]A[v/y]z), v(Az-C@uz)) E R. However, the value v being 
provided by the environment, no information on it is available. That is, we have 
no information on how v tests its input program. In particular, given any context 
C[—], we can consider the value Ax.C{z], meaning that proving Y and Z to be 
applicatively bisimilar is almost as hard as proving them to be contextually 
equivalent from first principles. 

As we will see, proving Z to be normal form similar to Y is straightfor- 
ward, since in order to test Ay.AA and Ay.y(Az.COOyz), we simply test their 
subterms AA and y(Az.0O0yz), thus not allowing the environment to influence 
computations. 


Example 2. Our next example is a refinement of Example 1 to a probabilistic 
setting, as proposed in [66] (but in a call-by-name setting). We consider a varia- 
tion of Turing’s call-by-value fixed point combinator which, at any iteration, can 
probabilistically decide whether to start another iteration (following the pattern 
of the standard Turing’s fixed point combinator) or to turn for good into Y, 
where Y and A are defined as in Example 1: 


Z 00, O Ê dx.dy.(y(Az.-AAz) or y(Az.rryz)). 


Notice that the constructor or behaves as a (fair) probabilistic choice operator, 
hence acting as an effect producer. It is natural to ask whether these new ver- 
sions of Y and Z are still equivalent. However, following insights from previous 
example, it is not hard to see the equivalence between Y and Z cannot be readily 
proved by means of standard operational methods such as probabilistic contex- 
tual equivalence [16], probabilistic CIU equivalence and logical relations [10], and 
probabilistic applicative bisimilarity [13,16]. All the aforementioned techniques 
require to test programs in a given environment (such as a whole context or an 
input argument), and are thus ineffective in handling fixed point combinators 
such as Y and Z. 

We will give an elementary proof of the equivalence between Y and Z 
in Example 17, and a more elegant proof relying on a suitable up-to context 
technique in Example 18. In [66], the call-by-name counterparts of Y and Z 
are proved to be equivalent using probabilistic environmental bisimilarity. The 
notion of an environmental bisimulation [63] involves both an environment stor- 
ing pairs of terms played during the bisimulation game, and a clause universally 
quantifying over pairs of terms in the evaluation context closure of such an 
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environment”, thus making environmental bisimilarity a rather heavy technique 
to use. Our proof of the equivalence of Y and Z is simpler: in fact, our notion 
of effectful normal form bisimulation does not involve any universal quantifica- 
tion over all possible closed function arguments (like applicative bisimilarity), 
or their evaluation context closure (like environmental bisimilarity), or closed 
instantiation of uses (like CIU equivalence). 


Example 3. Our third example concerns call-by-name calculi and shows how 
our notion of normal form bisimilarity can handle even intricate recursion 
schemes. We consider the following argument-switching probabilistic fixed point 
combinators: 


P £ AA, A £ Az. ày.àz.(y(zzyz) or z(xxzy)), 
Q Ê BB, B Ê \z.ày.àz.(y(xxzzy) or z(xxzyz)). 


We easily see that P and Q satisfy the following (informal) program equations: 


Pef = e(Pef) or f(Pfe), Qef = e(Q fe) or f(Qef). 


Again, proving the equivalence between P and Q using applicative bisimilarity 
is problematic. In fact, testing the applicative behaviour of P and Q requires to 
reason about the behaviour of e.g. e( Pef), which in turn requires to reason about 
the (arbitrary) term e, on which no information is provided. The (essentially 
infinitary) normal forms of P and Q, however, can be proved to be essentially 
the same by reasoning about the syntactical structure of P and Q. Moreover, our 
up-to context technique enables an elegant and concise proof of the equivalence 
between P and Q (Sect. 6.4). 


Example 4. Our last example discusses the use of the cost monad as an instru- 
ment to facilitate a more intensional analysis of programs. In fact, we can use the 
ticking operation tick to perform cost analysis. For instance, we can consider the 
following variation of Curry’s and Turing’s fixed point combinator of Example 1, 
obtained by adding the operation symbol tick after every \-abstraction. 


Y £ dy.tick(AA), A £ Agz.tick(y(Az.tick(xzz))), 

Z £ 00, O = \x.tick(Ay.tick(y(Az.tick(xryz)))). 
Every time a -redex (Ax.tick(e))v is reduced, the ticking operation tick 
increases an imaginary cost counter of a unit. Using ticking, we can provide 


a more intensional analysis of the relationship between Y and Z, along the lines 
of Sands’ improvement theory [62]. 


? Meaning that two terms e1, e2 are tested for their applicative behaviour against. all 
terms of the form E[e], E[e’], for any pair of terms (e, e’) stored in the environment. 
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3 Preliminaries: Monads and Algebraic Operations 


In this section we recall some basic definitions and results needed in the rest 
of the paper. Unfortunately, there is no hope to be comprehensive, and thus we 
assume the reader to be familiar with basic domain theory [2] (in particular with 
the notions of w-complete (pointed) partial order—w-cppo, for short—monotone, 
and continuous functions), basic order theory [19], and basic category theory [46]. 
Additionally, we assume the reader to be acquainted with the notion of a Kleisli 
triple [46] T = (T,n, —1). As it is customary, we use the notation ft : TX —> TY 
for the Kleisli extension of f : X — TY, and reserve the letter 7 to denote 
the unit of T. Due to their equivalence, oftentimes we refer to Kleisli triples as 
monads. 

Concerning notation, we try to follow [46] and [2], with the only pee 
that we use the notation (£n)n to denote an w-chain xo E t, E- in 
a domain (X,C, L). The notation T = (T,7,—1) for an pene Kleisli aul 
is standard, but it is not very handy when dealing with multiple monads at 
the same time. To fix this issue, we sometimes use the notation T = (T,T,—") 
to denote a Kleisli triple. Additionally, when unambiguous we omit subscripts. 
Finally, we denote by Set the category of sets and functions, and by Rel the 
category of sets and relations. We reserve the symbol 1 to denote the iden- 
tity function. Unless explicitly stated, we assume functors (and monads) to be 
functors (and monads) on Set. As a consequence, we write functors to refer to 
endofunctors on Set. 

We use monads to give operational semantics to our calculi. Following Moggi 
[49,50], we model notions of computation as monads, meaning that we use mon- 
ads as mathematical models of the kind of (side) effects computations may pro- 
duce. The following are examples of monads modelling relevant notions of com- 
putation. Due to space constraints, we omit several interesting examples such as 
the output, the exception, and the nondeterministic/powerset monad, for which 
the reader is referred to e.g. [50,73]. 


Example 5 (Partiality). Partial computations are modelled by the partiality 
(also called maybe) monad M = (M,m,—"). The carrier MX of M is defined 
as {just x | £x E€ X}U{L}, where L is a special symbol denoting divergence. 
The unit and Kleisli extension of M are defined as follows: 


M(x) Ê just x, f"(just x) * f(x), i Ce eee 


Example 6 (Probabilistic Nondeterminism). In this example we assume sets to 
be countable®. The (discrete) distribution monad D = (D,D,—°) has carrier 
DX = {u : X — [0,1] | £, u(x) = 1}, whereas the maps D and —°® are defined 
as follows (where y # x): 


D(x)(x) = 1, D(x)(y) = 0, PU) = VeexM(a) - f(z)(y). 


3 Although this is not strictly necessary, for simplicity we work with distributions over 
countable sets only, as the sets of values and normal forms are countable. 
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Oftentimes, we write a distribution u as a weighted formal sum. That is, we write 
as the sum* J- ,., pj-x; such that u(x) = 2, <2 Pi: D models probabilistic total 
computations, according to the rationale that a (total) probabilistic program 
evaluates to a distribution over values, the latter describing the possible results 
of the evaluation. Finally, we model probabilistic partial computations using the 
monad DM = (DM, pM, —""). The carrier of DM is defined as DMX = D(MX), 
whereas the unit DM is defined in the obvious way. For f : X — DMY, define: 


F”) = VaeexM(Just x) + f(x)(y) + u) - D(-L)(y). 
It is easy to see that DM is isomorphic to the subdistribution monad. 


Example 7 (Cost). The cost (also known as ticking or improvement [62]) monad 
C = (C,c, —°) has carrier CX = M(N x X). The unit of C is defined as c(x) = 
just (0,2), whereas Kleisli extension is defined as follows: 
f(x) 8 L if x= L, or x = just (n, x) and f(x) = L 
M just (n+-m,y) if x= just (n, x) and f(x) = just (m,y). 
The cost monad is used to model the cost of (partial) computations. An element 
of the form just (n, x) models the result of a computation outputting the value 
x with cost n (the latter being an abstract notion that can be instantiated to e.g. 
the number of reduction steps performed). Partiality is modelled as the element 
L, according to the rationale that we can assume all divergent computations to 
have the same cost, so that such information need not be explicitly written (for 


instance, measuring the number of reduction steps performed, we would have 
that divergent computations all have cost oo). 


Example 8 (Global states). Let £ be a set of public location names. We assume 
the content of locations to be encoded as families of values (such as numerals or 
booleans) and denote the collection of such values as V. A store (or state) is a 
function ø : L — V. We write S for the set of stores V“. The global state monad 
G = (G, G, —°) has carrier GX £ (X x S)%, whereas G and —® are defined by: 


G(x)(a) = (2, 0), F(a)(o) = f(2')(0’), 


where a(o) = (x', 0’). It is straightforward to see that we can combine the global 
state monad with the partiality monad, obtaining the monad M & G whose carrier 
is (M @G)X £ M(X x $)%. In a similar fashion, we see that we can combine 
the global state monad with DM and C, as we are going to see in Remark 1. 


Remark 1. The monads DM and M & C of Example 6 and Example 8, respectively, 
are instances of two general constructions, namely the sum and tensor of effects 
[28]. Although these operations are defined on Lawvere theories [29,40], here we 
can rephrase them in terms of monads as follows. 


t For simplicity, we write only those p;s such that p; > 0. 
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Proposition 1. Given a monad T = (T,T,—"), define the sum TM of T and M 
and the tensor T&C of T and C, as the triples (TM,1M,—"™) and (T®G,T®@ 
G, —7®°), respectively. The carriers of the triples are defined as TMX £ T(MX) 
and (T @ G)X £ T(S x X)%, whereas the maps TM and T&G are defined as 
TMx = Tmx ° My and (TQ G)x Scurry Tgxx, respectively. Finally, define: 


f™ 2 (fm), f*°°(a)(a) = (uncurry f) (a)(0), 


where, for a function f : X => TMY we define fy: MX —> TMY as fu(l) + 
Tmx(L), fu(just x) £ f(x), and curry and uncurry are defined as usual. Then 
TM and T & G are monads. 


Proving Proposition 1 is a straightforward exercise (the reader can also consult 
[28]). We notice that tensoring © with DM we obtain a monad for probabilistic 
imperative computations, whereas tensoring G with C we obtain a monad for 
imperative computations with cost. 


3.1 Algebraic Operations 


Monads provide an elegant way to structure effectful computations. However, 
they do not offer any actual effect constructor. Following Plotkin and Power 
[56-58], we use algebraic operations as effect producers. From an operational 
perspective, algebraic operations are those operations whose behaviour is inde- 
pendent of their continuations or, equivalently, of the environment in which they 
are evaluated. Intuitively, that means that e.g. E[e or eg] is operationally equiv- 
alent to Ele,| or Eļ|e2], for any evaluation context Æ. Examples of algebraic 
operations are given by (binary) nondeterministic and probabilistic choices as 
well as primitives for rising exceptions and output operations. 

Syntactically, algebraic operations are given via a signature X consisting of a 
set of operation symbols (uninterpreted operations) together with their arity (i.e. 
their number of operands). Semantically, operation symbols are interpreted as 
algebraic operations on monads. To any n-ary operation symbol? (op : n) € X 
and any set X we associate a map [op]x : ([X)" — TX (so that we equip 
TX with a X-algebra structure [12]) such that ft is ©-algebra morphism, 
meaning that for any f : X — TY, and elements 7%,...,%, E€ TX we have 


[op]y (F a) -o f(a) = F Copla (aa, --- a))- 


Example 9. The partiality monad M usually comes with no operation, as the 
possibility of divergence is an implicit feature of any Turing complete language. 
However, it is sometimes useful to add an explicit divergence operation (for 
instance, in strongly normalising calculi). For that, we consider the signature 
Xm {Q : 0}. Having arity zero, the operation @ acts as a constant, and has 
semantics [Q] = L. Since f"(L) = L, we see that 2 in indeed an algebraic 
operation on M. 


5 Here op denotes the operation symbol, whereas n > 0 denotes its arity. 
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For the distribution monad D we define the signature Xp = {or : 2}. The 
intended semantics of a program ej or ez is to evaluate to e; (i € {1,2}) with 
probability 0.5. The interpretation of or is defined by [or] (u, v)(x) £ 0.5- (a) + 
0.5- v(x). It is easy to see that or is an algebraic operation on D, and that it 
trivially extends to DM. 

Finally, for the cost monad C we define the signature Xec £ {tick : 1}. The 
intended semantics of tick is to add a unit to the cost counter: 


[tick](L) £ L, [tick] (just (n,x)) £ just (n + 1, x). 


The framework we have just described works fine for modelling operations 
with finite arity, but does not allow to handle operations with infinitary arity. 
This is witnessed, for instance, by imperative calculi with global stores, where it 
is natural to have operations of the form get,(x.k) with the following intended 
semantics: get,(a.k) reads the content of the location £, say it is a value v, and 
continue as k [v/a]. In order to take such operations into account, we follow [58] 
and work with generalised operations. 

A generalised operation (operation, for short) on a set X is a function w : 
P x X! — X. The set P is called the parameter set of the operation, whereas 
the (index) set I is called the arity of the operation. A generalised operation 
w: Px X! — X thus takes as arguments a parameter p (such as a location 
name) and a map «: I — X giving for each index i € I the argument «(?) 
to pass to w. Syntactically, generalised operations are given via a signature X 
consisting of a set of elements of the form op: P ~ I (the latter being nothing 
but a notation denoting that the operation symbols op has parameter set P and 
index set J). Semantically, an interpretation of an operation symbol op: P ~ I 
on a monad T associates to any set X a map [op]x : P x (TX)! — TX such 
that for any f: X ~TY,pe P, andKk:I—>TX: 


f [op] x(p, s)) = lop]y (p, ft ox). 


If T comes with an interpretation for operation symbols in X, we say that T is 
5/-algebraic. 

It is easy to see by taking the one-element set 1 = {x} as parameter set 
and a finite set as arity set, generalised operations subsume finitary operations. 
For simplicity, we use the notation op : n in place of op: 1 ~~ n, and write 
op(%1,---;%) in place of op(*,n +> xn). 


Example 10. For the global state monad we consider the signature Xe £ {setz : 
V~1,get,:1~ V |E L}. From a computational perspective, such operations 
are used to build programs of the form set,(v, e) and get,(x.e). The former stores 
the value v in the location £ and continues as e, whereas the latter reads the 
content of the location £, say it is v, and continue as e[v/a]. Here e is used as 
the description of a function Ke from values to terms defined by Ke(v) = e[v/z]. 
The interpretation of the new operations on G is standard: 


[sete] (v, a)() = a(o[é := uJ), [get,](*)(7) = K(o(4) (0). 
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Straightforward calculations show that indeed sete and get, are algebraic oper- 
ations on G. Moreover, such operations can be easily extended to the partial 
global state monad M ® G as well as to the probabilistic (partial) global store 
monad DM & G. These extensions share a common pattern, which is nothing 
but an instance of the tensor of effects. In fact, given a S'y-algebraic monad T 
we can define the signature tao as Xr U Xe, and observe that the T & G is 
Sta@e-algebraic. We refer the reader to [28] for details. Here we simply notice 
that we can define the interpretation [op]"®® of op : P ~ V on T&G as 
[op]}°°(p, «)(7) = [op] 5,.x(p,v > «(v)(c)), where [op]" is the interpretation 
of op on T (the interpretations of set, and get, are straightforward). 


Monads and algebraic operations provide mathematical abstractions to struc- 
ture and produce effectful computations. However, in order to give operational 
semantics to, e.g., probabilistic calculi [17] we need monads to account for infini- 
tary computational behaviours. We thus look at 3’-continuous monads. 


Definition 1. A X-algebraic monad T = (T,n,—') is X-continuous (cf. [24]) 
if to any set X is associated an order Lx and an element Lx € TX such that 
(TX,Ex, Lx) is an w-cppo, and for all(op: P ~ DEX, f, fng: X >TY, 
K, Kn,V:I—>TX, x, æn, y E TX, we have fi(L) = L and: 


«Cv = []op](p, K) E [op](p, v) op] (p, Unsn) = Un [op] (p, £n) 
JEg => Eg (Unta) = Un fA 
xEy => f'(x) E f'y) F Una) = Un Ga 


When clear from the context, we will omit subscripts in Ly and Ex. 


Example 11. The monads M, DM, GM, and C are X-continuous. The order on 
MX and C is the flat ordering CE defined by x E ye a= Llorx = y, 


whereas the order on DMX is defined by u E v AS wre X. p(just x) < 
v(just x). Finally, the order on GMX is defined pointwise from the flat ordering 
on M(X x S). 


Having introduced the notion of a X-continuous monad, we can now define 
our vehicle calculus Ay and its monadic operational semantics. 


4 A Computational Call-by-value Calculus 
with Algebraic Operations 


In this section we define the calculus Ay. Ay is an untyped A-calculus 
parametrised by a signature of operation symbols, and corresponds to the coarse- 
grain [44] version of the calculus studied in [15]. Formally, terms of Ay are defined 
by the following grammar, where x ranges over a countably infinite set of vari- 
ables and op is a generalised operation symbol in X. 


é neg | Az.e | ee | op(p, z.e). 
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A value is either a variable or a A-abstraction. We denote by A the collection 
of terms and by VY the collection of values of As. For an operation symbol 
op: P ~ I, we assume that set I to be encoded by some subset of V (using 
e.g. Church’s encoding). In particular, in a term of the form op(p, z.e), e acts 
as a function in the variable x that takes as input a value. Notice also how 
parameters p € P are part of the syntax. For simplicity, we ignore the specific 
subset of values used to encode elements of J, and simply write op: P ~ Y for 
operation symbols in X. 

We adopt standard syntactical conventions as in [5] (notably the so-called 
variable convention). The notion of a free (resp. bound) variable is defined as 
usual (notice that the variable x is bound in op(p,z.e)). As it is customary, we 
identify terms up to renaming of bound variables and say that a term is closed if 
it has no free variables (and that it is open, otherwise). Finally, we write f[e/z] 
for the capture-free substitution of the term e for all free occurrences of x in f. 
In particular, op(p, x’. f)[e/z] is defined as op(p, x’. f[e/z]). 

Before giving Ay call-by-value operational semantics, it is useful to remark a 
couple of points. First of all, testing terms according to their (possibly infinitary) 
normal forms obviously requires to work with open terms. Indeed, in order to 
inspect the intensional behaviour of a value Az.e, one has to inspect the inten- 
sional behaviour of e, which is an open term. As a consequence, contrary to the 
usual practice, we give operational semantics to both open and closed terms. 
Actually, the very distinction between open and closed terms is not that mean- 
ingful in this context, and thus we simply speak of terms. Second, we notice that 
values constitute a syntactic category defined independently of the operational 
semantics of the calculus: values are just variables and A-abstractions. However, 
giving operational semantics to arbitrary terms we are interested in richer col- 
lections of irreducible expressions, i.e. expressions that cannot be simplified any 
further. Such collections will be different accordingly to the operational seman- 
tics adopted. For instance, in a call-by-name setting it is natural to regard the 
term x((Az.x)v) as a terminal expression (being it a head normal form), whereas 
in a call-by-value setting x((Ax.x)v) can be further simplified to xv, which in 
turn should be regarded as a terminal expression. 

We now give Ay a monadic call-by-value operational semantics [15], post- 
poning the definition of monadic call-by-name operational semantics to Sect. 6.4. 
Recall that a (call-by-value) evaluation context [22] is a term with a single hole 
[—] defined by the following grammar, where e € A and v € V: 


E ::= [-] | Ee | vE. 


We write Efe] for the term obtained by substituting the term e for the hole [—] 
in E. 

Following [38], we define a stuck term as a term of the form E[xv]. Intuitively, 
a stuck term is an expression whose evaluation is stuck. For instance, the term 
e = y(Az.x) is stuck. Obviously, e is not a value, but at the same time it cannot 
be simplified any further, as y is a variable, and not a A-abstraction. Following 
this intuition, we define the collection E of eager normal forms (enfs hereafter) 
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as the collection of values and stuck terms. We let letters s,t,... range over 
elements in €. 


Lemma 1. Any term e is either a value v, or can be uniquely decomposed as 
either E|vw] or Elop(p, x.f)]. 


Operational semantics of Ay is defined with respect to a X-continuous monad 
T = (T,,—') relying on Lemma 1. More precisely, we define a call-by-value 
evaluation function [—] mapping each term to an element in TE. For instance, 
evaluating a probabilistic term e we obtain a distribution over eager normal 
forms (plus bottom), the latter being either values (meaning that the evaluation 
of e terminates) or stuck terms (meaning that the evaluation of e went stuck at 
some point). 


Definition 2. Define the N-indexed family of maps |—]n : A— TE as follows: 


[elo = 1, 
[vln = nw), 
[Elzv]]n+ = (Elz), 
[F[Oz-e)o]Jn41 £ [Elelv/2]lIn, 
Elop(p, ¢-e))n41 = [ople(p, v > [Ele[v/a}]]n). 


The monad T being +-continuous, we see that the sequence ([e]n)n forms 
an w-chain in TE, so that we can define fe] as | |,,e],. Moreover, exploiting 
X-continuity of T we see that [—] is continuous. 

We compare the behaviour of terms of Ay relying on the notion of an 
effectful eager normal form (bi)simulation, the extension of eager normal form 
(bi)simulation [38] to calculi with algebraic effects. In order to account for effect- 
ful behaviours, we follow [15] and parametrise our notions of equivalence and 
refinement by relators [6,71]. 


5 Relators 


The notion of a relator for a functor T (on Set) [71] (also called lax extension of T 
(6]) is a construction lifting a relation R between two sets X and Y to a relation 
IR between TX and TY. Besides their applications in categorical topology [6] 
and coalgebra [71], relators have been recently used to study notions of applica- 
tive bisimulation [15], logic-based equivalence [67], and bisimulation-based dis- 
tances [23] for \-calculi extended with algebraic effects. Moreover, several forms 
of monadic lifting [25,32] resembling relators have been used to study abstract 
notions of logical relations [55,61]. 

Before defining relators formally, it is useful to recall some background 
notions on (binary) relations. The reader is referred to [26] for further details. We 
denote by Rel the category of sets and relations, and use the notation R : X + Y 
for a relation R between sets X and Y. Given relations R : X + Y and 
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S: Y + Z, we write SoR: X + Z for their composition, and lx : X + X for 
the identity relation on X. Finally, we recall that for all sets X,Y, the hom-set 
Rel(X, Y) has a complete lattice structure, meaning that we can define relations 
both inductively and coinductively. 

Given a relation R : X + Y, we denote by R° : Y + X its dual (or 
opposite) relations and by —o : Set — Rel the graph functor mapping each 
function f : X — Y to its graph fə: X + Y. The functor —, being faithful, we 
will often write f : X — Y in place of fo : X + Y. It is useful to keep in mind 
the pointwise reading of relations of the form g°oSo f, for a relation S : Z + W 
and functions f : X > Z,g:Y => W: 


(9? oSo f)(x, y) = S(f(x), g(y)). 


Given R : X + Y, we can thus express a generalised monotonicity condition in a 
pointfree fashion using the inclusion R C g°oSo f. Finally, since we are interested 
in preorder and equivalence relations, we recall that a relation R : X + X is 
reflexive if ly C R, transitive if RoR C R, and symmetric if R C R°. We can 
now define relators formally. 


Definition 3. A relator for a functor T (on Set) is a set-indexed family of maps 
(R: X »Y)=(IR:TX + TY) satisfying conditions (rel 1)-(rel 4). We say 
that I is conversive if it additionally satisfies condition (rel 5). 


Le Erig); (rel 1) 
PSoTRCI(SoR), (rel 2) 
PPeETR TIP ere, (rel 3) 
RCS => IRCTS, (rel 4) 
PROSTORES (rel 5) 


Conditions (rel 1), (rel 2), and (rel 4) are rather standard®. As we will 
see, condition (rel 4) makes the defining functional of (bi)simulation relations 
monotone, whereas conditions (rel 1) and (rel 2) make notions of (bi)similarity 
reflexive and transitive. Similarly, condition (rel 5) makes notions of bisimi- 
larity symmetric. Condition (rel 3), which actually consists of two conditions, 
states that relators behave as expected when acting on (graphs of) functions. 
In [15,43] a kernel preservation condition is required in place of (rel 3). Such 
a condition is also known as stability in [27]. Stability requires the equality 
I'(g°oRo f) = (Tg)? oF RoTf to hold. It is easy to see that a relator always 
satisfies stability (see Corollary II.1.4.4 in [26]). 

Relators provide a powerful abstraction of notions of ‘relation lifting’, as 
witnessed by the numerous examples of relators we are going to discuss. However, 
before discussing such examples, we introduce the notion of a relator for a monad 
or lax extension of a monad. In fact, since we modelled computational effects as 
monads, it seems natural to define the notion of a relator for a monad (and not 
just for a functor). 


ê Notice that since | = (1). we can derive condition (rel 1) from condition (rel 3). 
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Definition 4. Let T = (T,n,—') be a monad, and T be a relator for T. We say 
that I is a relator for T if it satisfies the following conditions: 


RCnyoTRonx, (rel 7) 
RCo orSof = FRC (g) orSof. (rel 8) 


Finally, we observe that the collection of relators is closed under specific 
operations (see [43]). 


Proposition 2. Let T,U be functors, and let UT denote their composition. 
Moreover, let T, A be relators for T and U, respectively, and {T;yicr be a family 
of relators for T. Then: 


1. The map AT defined by ATR = A(TR) is a relator for UT. 
2. The maps Nic; Ti and T° defined by (Nier DOR = Mier LiR and TPR ê 
(PR°)°, respectively, are relators for T. 


3. Additionally, if I is a relator for a monad T, then so are N;c; T; and T°. 


iC] 
Example 12. For the partiality monad M we define the set-indexed family of 
maps M : Rel( X,Y) — Rel(MX, MY) as: 


LÖR y > (x= L)V Grex. Jy E Y. x= just x ^ y = just y ^z Ry). 


The mapping M describes the structure of the usual simulation clause for par- 
tial computations, whereas M° describes the corresponding co-simulation clause. 
It is easy to see that M is a relator for M. By Proposition 2, the map MAM isa 
conversive relator for M. It is immediate to see that the latter relator describes 
the structure of the usual bistmulation clause for partial computations. 


Example 13. For the distribution monad we define the relator D relying on the 
notion of a coupling and results from optimal transport [72]. Recall that a cou- 
pling for u € D(X) and v € D(Y) a is a joint distribution w € D(X x Y) 
such that: u = ey w(—,y) and v = pex wlz, —). We denote the set of 
couplings of u and v by Q(j1,v). Define the (set-indexed) map D : Rel(X, Y) > 
Rel(DX, DY) as follows: 


uDRv S (Iw € R(u, v). Yx, y. w(x, y) >0 => tRy). 


We can show that D is a relator for D relying on Strassen’s Theorem [69], which 
shows that D can be characterised universally (i.e. using an universal quantifi- 
cation). 


Theorem 1 (Strassen’s Theorem [69]). For all u € DX, v € DY, and 
R:X + Y, we have: wDRv => YXC X. p(X) < u(R[X)). 


As a corollary of Theorem 1, we see that D describes the defining clause of 
Larsen-Skou bisimulation for Markov chains (based on full distributions) [34]. 
Finally, we observe that DM + DM is a relator for DM. 
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Example 14. For relations R : X +> Y, S: X' + Y', let RxS:XxX'+YxY' 
be defined as (Rx S)((x, 2’), (y, y’)) A R(x, y) AS(x, y’). We define the relator 
Ô : Rel(X,Y) — Rel(C.X, CY) for the cost monad C as CR ê M(> x R), where 
> denotes the opposite of the natural ordering on N. It is straightforward to see 
that Ĉ is indeed a relator for C. The use of the opposite of the natural order 
in the definition of C captures the idea that we use € to measure complexity. 
Notice that C describes Sands’ simulation clause for program improvement [62]. 


Example 15. For the global state monad G we define the map G : Rel(X,Y) > 


Rel(GX, GY) as a GR 8 4& Vo € S. a(o) (Is x R) (0). It is straightforward 
to see that G is a relator for G. 


It is not hard to see that we can extend G to relators for M Q C, DM & G, and 
C & C. In fact, Proposition 1 extends to relators. 


Proposition 3. Given a monad T = (T,T,—") and a relator T for T, define 
the sum TM of T and M as TM. Additionally, define the tensor T & G of t and 6 
by a(T @G)R BG if an only if Vo. a(o) Î(ls x R) B(o). Then TM is a relator for 


ee 


TM, and (T & C) is a relator for T @ G. 


Finally, we require relators to properly interact with the X-continuous structure 
of monads. 


Definition 5. Let T = (T,n,—') be a X-continuous monad and T be relator for 
T. We say that I’ is 3’-continuous if it satisfies the following clauses—called the 
inductive conditions —for any w-chain (xn)n in TX, element y € TY, elements 
x, x ETX, and relation R: X >Y. 


LIRy, xKEx, xX TRy => xIRy, W. a Ry => |n a Ry. 


The relators M, DM, Ĉ, Ma 6, DMS C, C&G are all Y-continuous. The 
reader might have noticed that we have not imposed any condition on how 
relators should interact with algebraic operations. Nonetheless, it would be quite 
natural to require a relator T to satisfy condition (rel 9) below, for all operation 
symbol op: P ~ I € X, maps «,v: I — TX, parameter p € P, and relation R. 


Viel. wa) ITR v(i) => [op](p, K) TR [op](p, v) (rel 9) 


Remarkably, if T is X-algebraic, then any relator for T satisfies (rel 9) (cf. 


[15]). 


Proposition 4. Let T = (T,n,—') be a X-algebraic monad, and let I be a 
relator for T. Then T satisfies condition (rel 9). 


Having defined relators and their basic properties, we now introduce the 
notion of an effectful eager normal form (bi)simulation. 
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6 Effectful Eager Normal Form (Bi)simulation 


In this section we tacitly assume a ¥-continuous monad T = (T,7,—') and a 
5/-continuous relator I" for it be fixed. X-continuity of I’ is not required for 
defining effectful eager normal form (bi)simulation, but it is crucial to prove 
that the induced notion of similarity and bisimilarity are precongruence and 
congruence relations, respectively. 

Working with effectful calculi, it is important to distinguish between relations 
over terms and relations over eager normal forms. For that reason we will work 
with pairs of relations of the form (Ra : A + A, Re : E + E), which we call 
A-term relations (or term relations, for short). We use letters R,S,... to denote 
term relations. The collection of A-term relations (i.e. Rel(A, A) x Rel(€, €)) inher- 
its a complete lattice structure from Rel(A, A) and Rel(€,€) pointwise, hence 
allowing A-term relations to be defined both inductively and coinductively. We 
use these properties to define our notion of effectful eager normal form similarity. 


Definition 6. A term relation R = (R, : A+ A, Re : E + E) is an effectful 
eager normal form simulation with respect to I’ (hereafter enf-simulation, as I 
will be clear from the context) if the following conditions hold, where in condition 
(enf 4) z Z FV(E)UFV(E’). 


eR, f = le IR: [fl], 
tRes => s=2%, 


Aze Res = > Af. s=dAv.freR, f, 


Ejro] Re s => 3F',v'. s = E'[xv'] Au Rev’ Adz. Elz] Ra E'[z]. 
We say that relation R respects enfs if it satisfies conditions (enf 2)-(enf 4). 


Definition 6 is quite standard. Clause (enf 1) is morally the same clause on 
terms used to define effectful applicative similarity in [15]. Clauses (enf 2) and 
(enf 3) state that whenever two enfs are related by Rẹ, then they must have 
the same outermost syntactic structure, and their subterms must be pairwise 
related. For instance, if Ax.e Re s holds, then s must the a A-abstraction, i.e. an 
expression of the form Ax. f, and e and f must be related by Ry. 

Clause (enf 4) is the most interesting one. It states that whenever E|xv] Res, 
then s must be a stuck term E’|xv'], for some evaluation context E’ and value 
v’. Notice that E[vv] and s must have the same ‘stuck variable’ x. Addition- 
ally, v and v’ must be related by Re, and E and E’ must be properly related 
too. The idea is that to see whether E and EL’ are related, we replace the stuck 
expressions xv, xv’ with a fresh variable z, and test E[z] and E’[z] (thus resum- 
ing the evaluation process). We require E|z] Re E’[z] to hold, for some fresh 
variable z. The choice of the variable does not really matter, provided it is fresh. 
In fact, as we will see, effectful eager normal form similarity <* is substitutive 
and reflexive. In particular, if E[z] <$ E’[z] holds, then Ely] < E’[y] holds as 
well, for any variable y ¢ FV(E)U FV(E’). 
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Notice that Definition 6 does not involve any universal quantification. In 
particular, enfs are tested by inspecting their syntactic structure, thus making 
the definition of an enf-simulation somehow ‘local’: terms are tested in isolation 
and not via their interaction with the environment. This is a major difference 
with e.g. applicative (bi)simulation, where the environment interacts with A- 
abstractions by passing them arbitrary (closed) values as arguments. 

Definition 6 induces a functional R + [R] on the complete lattice Rel(A, A) x 
Rel(E,E), where [R] = ([R],,[R]e) is defined as follows (here lẹ denotes the 
identity relation on variables, i.e. the set of pairs of the form (x, æ)): 


[Rla = {(e, f) | fe] Re [fb 
[Rle £ lx U {(Ax.e, Az. f) |e Ra f}, 
U{(E[xv], E'[xv']) | v Re v Adz g FV(E) U FV(E’). Elz] R, E'[z]}. 


It is easy to see that a term relation R is an enf-simulation if and only if 
R C [R]. Notice also that although [R] always contains the identity relation 
on variables, R does not have to: the empty relation (Ø, Ø) is an enf-simulation. 
Finally, since relators are monotone (condition (rel 4)), R + [R] is monotone 
too. As a consequence, by Knaster-Tarski Theorem [70], it has a greatest fixed 
point which we call effectful eager normal form similarity with respect to I 
(hereafter enf-similarity) and denote by <5 = (<5, <£). Enfsimilarity is thus 
the largest enf-simulation with respect to I’. Moreover, <* being defined coin- 
ductively, it comes with an associated coinduction proof principle stating that if 
a term relation F is an enf-simulation, then it is contained in <*. Symbolically: 
RCR] = RC £. 


Example 16. We use the coinduction proof principle to show that <* contains the 
G-rule, viz. (Ax.e)v <5 eļv/zx]. For that, we simply observe that the term relation 
({((Az.e)v, elv/x])}, le) is an enf-simulation. Indeed, [(Ax.e)v] = [e[v/z]], so that 
by (rel 1) we have [(Az.e)v] Tle [e[v/a]]. 


Finally, we define effectful eager normal form bisimilarity. 


Definition 7. A term relation R is an effectful eager normal form bisimula- 
tion with respect to I (enf-bisimulation, for short) if it is a symmetric enf- 
simulation. Eager normal bisimilarity with respect to I (enf-bisimilarity, for 
short) ~ is the largest symmetric enf-simulation. In particular, enf-bisimilarity 
(with respect to I’) coincides with enf-similarity with respect to IT AI°. 


Example 17. We show that the probabilistic call-by-value fixed point combina- 
tors Y and Z of Example 2 are enf-bisimilar. In light of Proposition 5, this 
allows us to conclude that Y and Z are applicatively bisimilar, and thus con- 
textually equivalent [15]. Let us consider the relator DM for probabilistic partial 
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computations. We show Y >F Z by coinduction, proving that the symmetric clo- 
sure of the term relation R = (R4, Re) defined as follows is an enf-simulation: 


R, £ {(Y, Z), (Adz, Zyz), (AA, y(Az.-AAz) or y(Az.Zyz))} U la 
Re = {(y(Az.AAz), y(Az.Zyz)), (Az.AAz, rAz.Zyz), 
(Ay. AA, Ay.(y(Az.AAz) or y(Az.Zyz))), (y(Az.AAz)z, y(Az.Zyz)z)} U le. 


The term relation R is obtained from the relation {(Y,Z)} by progressively 
adding terms and enfs according to clauses (enf 1)—(enf 4) in Definition 6. Check- 
ing that R is an enf-simulation is straightforward. As an illustrative example, 
we prove that AAz R, Zyz implies [AAz] DM(R¿) [Zyz]. The latter amounts 
to show: 


a 1 1 
(1- just y(Az.AAz)z) DM(Re) (a - just y(Az.AAz)z + 5 - just y(Az.Zyz)z), 
where, as usual, we write distributions as weighted formal sums. To prove the 


latter, it is sufficient to find a suitable coupling of [AAz] and [Zyz]. Define the 
distribution w € D(ME x ME) as follows: 


w(just y(Az.AAz)z, just y(Az.AAz)z) = 


? 


NLR N| = 


w(just y(Az.AAz)z, just y(Az.Zyz)z) = =, 
and assigning zero to all other pairs in ME x ME. Obviously w is a coupling of 
[AAz] and [Zyz]. Additionally, we see that w(x, y) implies x Re y, since both 
y(Az.AAz)z Re y(Az.AAz)z, and y(Az.AAz)z Re y(Az.Zyz)z hold. 

As already discussed in Example 2, the operational equivalence between 
Y and Z is an example of an equivalence that cannot be readily established 
using standard operational methods—such as CIU equivalence or applicative 
bisimilarity—but whose proof is straightforward using enf-bisimilarity. Addi- 
tionally, Theorem 3 will allow us to reduce the size of R, thus minimising the 
task of checking that our relation is indeed an enf-bisimulation. To the best of 
the authors’ knowledge, the probabilistic instance of enf-(bi)similarity is the first 
example of a probabilistic eager normal form (bi)similarity in the literature. 


6.1 Congruence and Precongruence Theorems 


In order for <ë and ~* to qualify as good notions of program refinement 
and equivalence, respectively, they have to allow for compositional reasoning. 
Roughly speaking, a term relation R is compositional if the validity of the rela- 
tionship Cle] RC[e'] between compound terms C[e], C[e’] follows from the validity 
of the relationship eR e’ between the subterms e, e’. Mathematically, the notion 
of compositionality is formalised throughout the notion of compatibility, which 
directly leads to the notions of a precongruence and congruence relation. In this 
section we prove that <* and ~* are substitutive precongruence and congruence 
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Ry el s RE s! 
Ree eee OO eee 
e RSE Í Ci sc e; e RSE f 
eee (scabs) ———_=-—_ (sc-app) = (sc-op) 
me Re Ax. f e1e2 Ry ees op(p, z.e) RX op(p, x.f) 
RE vy! RE wy! RSE e' RE yy! 
e o — (sc-subst-val) srat Pur (sc-subst) 
v[w/x] Re v'[w'/z] elu/a] RX e'[v'/z] 
Biz] Rix Ez] RE v Elz] RE E'[z] eRe! 
(sc-stuck) (sc-ectx) 


E[xv] RE Elzv'] Ele] RX E'[e’] 


Fig. 1. Compatible and substitutive closure construction. 


relations, that is preorder and equivalence relations closed under term construc- 
tors of Ay and substitution, respectively. To prove such results, we generalise 
Lassen’s relational construction for the pure call-by-name A-calculus [37]. Such 
a construction has been previously adapted to the pure call-by-value A-calculus 
(and its extension with delimited and abortive control operators) in [9], whereas 
Lassen has proved compatibility of pure eager normal form bisimilarity via a 
CPS translation [38]. Both those proofs rely on syntactical properties of the cal- 
culus (mostly expressed using suitable small-step semantics), and thus seem to 
be hardly adaptable to effectful calculi. On the contrary, our proofs rely on the 
properties of relators, thereby making our results and techniques more modular 
and thus valid for a large class of effects. 

We begin proving precongruence of enf-similarity. The central tool we use to 
prove the wished precongruence theorem is the so-called (substitutive) context 
closure [37] RS of a term relation R, which is inductively defined by the rules 
in Fig. 1, where x € {4, E}, i € {1,2}, and z ¢ FV(E) UFV(E’). 

We easily see that RSS is the smallest term relation that contains R, it is 
closed under language constructors of Ay (a property known as compatibility 
[5]), and it is closed under the substitution operation (a property known as 
substitutivity [5]). As a consequence, we say that a term relation R is a substi- 
tutive compatible relation if R C R (and thus R = R**). If, additionally, R 
is a preorder (resp. equivalence) relation, then we say that R is a substitutive 
precongruence (resp. substitutive congruence) relation. 

We are now going to prove that if R is an enf-simulation, then so is RSS. In 
particular, we will infer that (<*)* is a enf-simulation, and thus it is contained 
in <", by coinduction. 


Lemma 2 (Main Lemma). If R be an enf-simulation, then so is RSS. 
Proof (sketch). The proof is long and non-trivial. Due to space constraints here 
we simply give some intuitions behind it. First, a routine proof by induction 


shows that since R respects enfs, then so does RS. Next, we wish to prove 
that e R$ f implies [e] [RX [f]. Since I is inductive, the latter follows if 
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for any n > 0, e R$ f implies [e]n [RE [f]. We prove the latter implication 
by lexicographic induction on (1) the natural number n and (2) the derivation 
e RX f. The case for n = 0 is trivial (since I is inductive). The remaining 
cases are nontrivial, and are handled observing that [E[e]] = (s > [E[s]])' [e] 
and [e[v/z]]n E [-[v/x]]} [e]n. Both these identities allow us to apply condition 
(rel 8) to simplify proof obligations (usually relying on part (2) of the induction 
hypothesis as well). This scheme is iterated until we reach either an enf (in which 
case we are done by condition (rel 7)) or a pair of expressions on which we can 
apply part (1) of the induction hypothesis. 


Theorem 2. Enf-similarity (resp. bisimilarity) is a substitutive precongruence 
(resp. congruence) relation. 


Proof. We show that enf-similarity is a substitutive precongruence relation. By 
Lemma 2, it is sufficient to show that <* is a preorder. This follows by coinduc- 
tion, since the term relations | and <* o <€ are enf-simulations (the proofs make 
use of conditions (rel 1) and (rel 2), as well as of substitutivity of <"). 

Finally, we show that enf-bisimilarity is a substitutive congruence relation. 
Obviously œ" is an equivalence relation, so that it is sufficient to prove (~*)*° C 
œ", That directly follows by coinduction relying on Lemma 2, provided that 
(~*)S° is symmetric. An easy inspection of the rules in Fig. 1 reveals that R* is 
symmetric, whenever FR is. 


6.2 Soundness for Effectful Applicative (Bi)similarity 


Theorem 2 qualifies enf-bisimilarity and enf-similarity as good candidate notions 
of program equivalence and refinement for Ay, at least from a structural perspec- 
tive. However, we gave motivations for such notions looking at specific examples 
where effectful applicative (bi)similarity is ineffective. It is then natural to ask 
whether enf-(bi)similarity can be used as a proof technique for effectful applica- 
tive (bi)similarity. 

Here we give a formal comparison between enf-(bi)similarity and effectful 
applicative (bi)similarity, as defined in [15]. First of all, we rephrase the notion of 
an effectful applicative (bi)simulation of [15] to our calculus Ay. For that, we use 
the following notational convention. Let Ag, Vo denote the collections of closed 
terms and closed values, respectively. We notice that if e € Ao, then fe] € TV. 
As a consequence, [—] induces a closed evaluation function |-| : Ao > TVo 
characterised by the identity [—] o: = Tvo |—|, where + : Vo => € is the obvious 
inclusion map. We can thus phrase the definition of effectful applicative similarity 
(with respect to a relator I”) as follows. 


Definition 8. A term relation R = (Ra, : Ao + Ao, Ry, : Vo + Vo) is 

an effectful applicative simulation with respect to I (applicative simulation, for 
short) if: 

e Ra i = lel TR» Ifl, (app 1) 

An.e Ry, Az.f => W E Vo. elv/a] Rao Flv/z]. (app 2) 
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As usual, we can define effectful applicative similarity with respect to I’ (applica- 
tive similarity, for short), denoted by < = (x%,,},), coinductively as the 
largest applicative simulation. Its associated coinduction proof principle states 
that if a relation is an applicative simulation, then it is contained in applica- 
tive similarity. Finally, we extend <5 to arbitrary terms by defining the relation 


<^ = (<4, 3$) as follows: let e, f,w,u be terms and values with free variables 


among Z = 21,...,%n. We let 0 range over n-ary sequences of closed values 
V1,.++5;Un- Define: 
A 
ex’ f Ss Vo. e[0/a] <*, S/Z, w u > Vo. w[0/a] <4, ufoa]. 


The following result states that enf-similarity is a sound proof technique for 
applicative similarity. 


Proposition 5. Enf-similarity <* is included in applicative similarity <^. 


Proof. Let <° = (<9, <$) denote enf-similarity restricted to closed terms and 
values. We first show that <$ is an applicative simulation, from which follows, by 
coinduction, that it is included in <9. It is easy to see that <$ satisfies condition 
(app 2). In order to prove that it also satisfies condition (app 1), we have to show 
that for all e, f € Ao, e <$, f implies |e] XS, |f|. Since e <$, f obviously implies 
u(e) <5 (f), by (enf 1) we infer [z(e)] r$, [e(f)], and thus Tije] r$, Tu] f|. 
By stability of I’, the latter implies |e] ['(v° o <e 0+) |f|, and thus the wished 
thesis, since ¿° o X, 04 is nothing but <5. Finally, we show that for all terms 
e, f, ifex® f, then e <4 f (a similar result holds mutatis mutandis for values, so 
that we can conclude <€ C <^). Indeed, suppose FV(e) U FV(f) C z, then by 
substitutivity of <* we have that e <5 f implies e[0/Z] <5 f[0/Z], for all closed 
values v (notice that since we are substituting closed values, sequential and 
simultaneous substitution coincide). That essentially means e[t/Z] <$ f[v/z], 
and thus e[t/z] <4, f[v/z]. We thus conclude e <4 f. 


Since in [15] it is shown that effectful applicative similarity (resp. bisimilarity) is 
contained in effectful contextual approximation (resp. equivalence), Proposition 
5 gives the following result. 


Corollary 1. Enf-similarity and enf-bisimilarity are sound proof techniques for 
contextual approximation and equivalence, respectively. 


Although sound, enf-bisimilarity is not fully abstract for applicative bisimilar- 
ity. In fact, as already observed in [38], in the pure A-calculus enf-bisimilarity is 
strictly finer than applicative bisimilarity (and thus strictly finer than contex- 
tual equivalence too). For instance, the terms xv and (Ay.xv) (av) are obviously 
applicatively bisimilar but not enf-bisimilar. 


6.3 Eager Normal Form (Bi)simulation Up-to Context 


The up-to context technique [37,60,64] is a refinement of the coinduction proof 
principle of enf-(bi)similarity that allows for handier proofs of equivalence and 
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refinement between terms. When exhibiting a candidate enf-(bi)simulation rela- 
tion R, it is desirable for R to be as small as possible, so to minimise the task 
of verifying that R is indeed an enf-(bi)simulation. 

The motivation behind such a technique can be easily seen looking at Exam- 
ple 17, where we showed the equivalence between the probabilistic fixed point 
combinators Y and Z working with relations containing several administrative 
pairs of terms. The presence of such pairs was forced by Definition 7, although 
they appear somehow unnecessary in order to convince that Y and Z exhibit 
the same operational behaviour. 

Enf-(bi)simulation up-to context is a refinement of enf-(bi)simulation that 
allows to check that a relation R behaves as an enf-(bi)simulation relation up to 
its substitutive and compatible closure. 


Definition 9. A term relation R = (R; : A+ A, Rẹ : E + E) is an effectful 
eager normal form simulation up-to context with respect to I (enf-simulation 
up-to context, hereafter) if satisfies the following conditions, where in condition 
(up-to 4) z g FV(E)UFV(E’). 


eRa f = [e] PRS [f] (up-to 1) 
Res => s=rT, (up-to 2) 
Aw.eRe Ss => Jf. s=dAu.fNeR* f, (up-to 3) 
Ejro] Res => 3F',v'. s = E'[zv'] Av RE v Az. Elz] R$ E'[z]. (up-to 4) 


In order for the up-to context technique to be sound, we need to show that 
every enf-simulation up-to context is contained in enf-similarity. This is a direct 
consequence of the following variation of Lemma 2. 


Lemma 3. If R is a enf-simulation up-to context, then RS is a enf-simulation. 


Proof. The proof is structurally identical to the one of Lemma 2, where we simply 
observe that wherever we use the assumption that R is an enf-simulation, we 
can use the weaker assumption that R is an enf-simulation up-to context. 


In particular, since by Lemma 2 we have that <E = (x*)*, we see that enf- 
similarity is an enf-simulation up-to context. Additionally, by Lemma 3 it is the 
largest such. Since the same result holds for enf-bisimilarity and enf-bisimilarity 
up-to context, we have the following theorem. 


Theorem 3. Enf-similarity is the largest enf-simulation up-to context, and enf- 
bisimilarity is the largest enf-bisimulation up-to contest. 


Example 18. We apply Theorem 3 to simplify the proof of the equivalence 
between Y and Z given in Example 17. In fact, it is sufficient to show that 
the symmetric closure of term relation R defined below is an enf-bisimulation 
up-to context. 


Ra £ {(Y, Z), (Adz, Zyz), (AA, y(Az.AAz) or y(Az.Zyz))}, Re Sle. 
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Example 19. Recall the fixed point combinators with ticking operations Y and 
Z of Example 4. Let us consider the relator Ĉ. It is not hard to see that Y and 
Z are not enf-bisimilar (that is because the ticking operation is evaluated at 
different moments, so to speak). Nonetheless, once we pass them a variable zo 
as argument, we have Zzxo <5 Y xo. For, observe that the term relation R defined 
below is an enf-simulation up-context. 


Ra = {(Y x0, Zz0), (tick(A[x0/y] A[zo/y]z), tick(OOzoz))}, Re =O. 


Intuitively, Y executes a tick first, and then proceeds iterating the evaluation 
of Alxo/y]A[zo/y], the latter involving two tickings only. On the contrary, Z 
proceeds by recursively call itself, hence involving three tickings at any iteration, 
so to speak. Since <" is substitutive, for any value v we have Zv <" Yv. 


Theorem 3 makes enf-(bi)similarity an extremely powerful proof technique for 
program equivalence/refinement, especially because it is yet unknown whether 
there exist sound up-to context techniques for applicative (bi)similarity [35]. 


6.4 Weak Head Normal Form (Bi)simulation 


So far we have focused on call-by-value calculi, since in presence of effects the 
call-by-value evaluation strategy seems the more natural one. Nonetheless, our 
framework can be easily adapted to deal with call-by-name calculi too. In this last 
section we spend some words on effectful weak head normal form (bi)similarity 
(whnf-(bi)similarity, for short). The latter is nothing but the call-by-name coun- 
terpart of enf-(bi)similarity. The main difference between enf-(bi)similarity and 
whnf-(bi)similarity relies on the notion of an evaluation context (and thus of a 
stuck term). In fact, in a call-by-name setting, Ay evaluation contexts are expres- 
sions of the form [—]e1 - +- en, which are somehow simpler than their call-by-value 
counterparts. Such a simplicity is reflected in the definition of whnf-(bi)similarity, 
which allows to prove mutatis mutandis all results proved for enf-(bi)similarity 
(such results are, without much of a surprise, actually easier to prove). 

We briefly expand on that. The collection of weak head normal forms (whnfs, 
for short) W is defined as the union of V and the collection of stuck terms, the lat- 
ter being expressions of the form xe, --: en. The evaluation function of Definition 
2 now maps terms to elements in TW, and it is essentially obtained modifying 
Definition 2 defining [E[ze]]n41 = n(E[ze]) and [E[(Az.f)e]Jngi = [E[f[e/z]]]Jn- 
The notion of a whnf-(bi)simulation (and thus the notions of whnf-(bi)similarity) 
is obtained modifying Definition 6 accordingly. In particular, clauses (enf 2) 
and (enf 4) are replaced by the following clause, where we use the notation 
R = (Ra: A+ A, Ry : W + W) to denote a (call-by-name) A-term relation. 


reg: ek Rw 8 = > fo,..-., fe. 8=Ufo--- fk A Vi. ei Ra fi- 


A straightforward modifications of the rules in Fig.1 allows to prove an anal- 
ogous of Lemma 2 for whnf-simulations, and thus to conclude (pre)congruence 
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properties of whnf-(bi)similarity. Additionally, such results generalise to whnf- 
(bi)simulation up to-context, the latter being defined according to Definition 
9, so that we have an analogous of Theorem 3 as well. The latter allows 
to infer the equivalence of the argument-switching fixed point combinators of 
Example 3, simply by noticing that the symmetric closure of the term relation 
R = ({(P, Q), (Pyz, Qzy), (Pzy, Qyz)}, 0) is a whnf-bisimulation up-to context. 

Finally, it is straightforward to observe that whnf-(bi)similarity is included 
in the call-by-name counterpart of effectful applicative (bi)similarity, but that 
the inclusion is strict. In fact, the (pure A-calculus) terms xa and x(Ay.xy) are 
applicatively bisimilar, but not whnf-bisimilar. 


7 Related Work 


Normal form (bi)similarity has been originally introduced for the call-by-name A- 
calculus in [65], where it was called open bisimilarity. Open bisimilarity provides 
a coinductive characterisation of Lévy-Longo tree equivalence [42,45,53], and 
has been shown to coincide with the equivalence (notably weak bisimilarity) 
induced by Milner’s encoding of the A-calculus into the z-calculus [48]. 

In [37] normal form bisimilarity relations characterising both Böhm and Lévy- 
Longo tree equivalences have been studied by purely operational means, provid- 
ing new congruence proofs of the aforementioned tree equivalences based on 
suitable relational constructions. Such results have been extended to the call- 
by-value -calculus in [38], where the so-called eager normal form bisimilarity is 
introduced. The latter is shown to coincide with the Lévy-Longo tree equivalence 
induced by a suitable CPS translation [54], and thus to be a congruence relation. 
An elementary proof of congruence properties of eager normal form bisimilar- 
ity is given in [9], where Lassen’s relational construction [37] is extended to the 
call-by-value A-calculus, as well as its extensions with delimited and abortive con- 
trol operators. Finally, following [65], eager normal form bisimilarity has been 
recently characterised as the equivalence induced by a suitable encoding of the 
(call-by-value) \-calculus in the z-calculus [21]. 

Concerning effectful extensions of normal form bisimilarity, our work seems 
to be rather new. In fact, normal form bisimilarity has been studied for deter- 
ministic extensions of the »-calculus with specific non-algebraic effects, notably 
control operators [9], as well as control and state [68] (where full abstraction of 
the obtained notion of normal form bisimilarity is proved). The only extension 
of normal form bisimilarity to an algebraic effect the authors are aware of, is 
given in [39], where normal form bisimilarity is studied for a nondeterministic 
call-by-name -calculus. However, we should mention that contrary to normal 
form bisimilarity, both nondeterministic [20] and probabilistic [41] extensions of 
Bohm tree equivalence have been investigated (although none of them employ, 
to the best of the authors’ knowledge, coinductive techniques). 
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8 Conclusion 


This paper shows that effectful normal form bisimulation is indeed a powerful 
methodology for program equivalence. Interestingly, the proof of congruence for 
normal form bisimilarity can be given just once, without the necessity of redoing 
it for every distinct notion of algebraic effect considered. This relies on the fact 
that the underlying monad and relator are X-continuous, something which has 
already been proved for many distinct notions of effects [15]. 

Topics for further work are plentiful. First of all, a natural question is whether 
the obtained notion of bisimilarity coincides with contextual equivalence. This 
is known not to hold in the deterministic case [37,38], but to hold in presence of 
control and state [68], which offer the environment the necessary discriminating 
power. Is there any (sufficient) condition on effects guaranteeing full abstraction 
of normal form bisimilarity? This is an intriguing question we are currently 
investigating. In fact, contrary to applicative bisimilarity (which is known to 
be unsound in presence of non-algebraic effects [33], such as local states), the 
syntactic nature of normal form bisimilarity seems to be well-suited for languages 
combining both algebraic and non-algebraic effects. 

Another interesting topic for future research, is investigating whether normal 
form bisimilarity can be extended to languages having both algebraic operations 
and effect handlers [7,59]. 
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Abstract. Modern software is no more developed in a single program- 
ming language. Instead, programmers tend to exploit cross-language 
interoperability mechanisms to combine code stemming from differ- 
ent languages, and thus yielding fully-fledged multi-language programs. 
Whilst this approach enables developers to benefit from the strengths of 
each single-language, on the other hand it complicates the semantics of 
such programs. Indeed, the resulting multi-language does not meet any of 
the semantics of the combined languages. In this paper, we broaden the 
boundary functions-based approach à la Matthews and Findler to pro- 
pose an algebraic framework that provides a constructive mathematical 
notion of multi-language able to determine its semantics. The aim of this 
work is to overcome the lack of a formal method (resp., model) to design 
(resp., represent) a multi-language, regardless of the inherent nature of 
the underlying languages. We show that our construction ensures the 
uniqueness of the semantic function (i.e., the multi-language semantics 
induced by the combined languages) by proving the initiality of the term 
model (i.e., the abstract syntax of the multi-language) in its category. 


Keywords: Multi-language design - Program semantics - 
Interoperability 


1 Introduction 


Two elementary arguments lie at the heart of the multi-language paradigm: the 
large availability of existing programming languages, along with a very high num- 
ber of already written libraries, and software that, in general, needs to interoper- 
ate. Although there is consensus in claiming that there is no best programming 
language regardless of the context [4,8], it is equally true that many of them are 
conceived and designed in order to excel for specific tasks. Such examples are R 
for statistical and graphical computation, Perl for data wrangling, Assembly and 
C for low-level memory management, etc. “Interoperability between languages has 
been a problem since the second programming language was invented” [8], so it is 
hardly surprising that developers have focused on the design of cross-language 
interoperability mechanisms, enabling programmers to combine code written in 
different languages. In this sense, we speak of multi-languages. 
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The field of cross-language interoperability has been driven more by practical 
concerns than by theoretical questions. The current scenario sees several engines 
and frameworks [13, 28, 29,44,47] (among others) to mix programming languages 
but only [30] discusses the semantic issues related to the multi-language design 
from a theoretical perspective. Moreover, the existing interoperability mech- 
anisms differ considerably not only from the viewpoint of the combined lan- 
guages, but also in terms of the approach used to provide the interoperation. 
For instance, Nashorn [47] is a JavaScript interpreter written in Java to allow 
embedding JavaScript in Java applications. Such engineering design works in a 
similar fashion of embedded interpreters [40,41].' On the contrary, Java Native 
Interface (JNI) framework [29] enables the interoperation of Java with native 
code written in C, C++, or Assembly through external procedure calls between 
languages, mirroring the widespread mechanism of foreign function interfaces 
(FFI) [14], whereas theoretical papers follow the more elegant approach of bound- 
ary functions (or, for short, boundaries) in the style of Matthews and Findler’s 
multi-language semantics [30]. Simply put, boundaries act as a gate between 
single-languages. When a value needs to flow on the other language, they per- 
form a conversion so that it complies to the other language specifications. 

The major issue concerning this new paradigm is that multi-language pro- 
grams do not obey any of the semantics of the combined languages. As a con- 
sequence, any method of formal reasoning (such as static program analysis or 
verification) is neutralized by the absence of a semantics specification. In this 
paper, we propose an algebraic framework based on the mechanism of boundary 
functions [30] that unambiguously yields the syntax and the semantics of the 
multi-language regardless the combined languages. 


The Lack of a Multi-Language Framework. The notion of multi-language is 
employed naively in several works in literature [2,14,21,30,35-37,49] to indi- 
cate the embedding of two programming languages into a new one, with its own 
syntax and semantics. 

The most recurring way to design a multi-language is to exploit a mechanism 
(like embedded interpreters, FFI, or boundary functions) able to regulate both 
control flow and value conversion between the underlying languages [30], thus 
adequate to provide cross-language interoperability [8]. The full construction is 
usually carried out manually by language designers, which define the multi- 
language by reusing the formal specifications of the single-languages [2,30, 36, 
37| and by applying the selected mechanism for achieving the interoperation. 
Inevitably, therefore, all these resulting multi-languages notably differ one from 
another. 

These different ways to achieve a cross-language interoperation are all 
attributable to the lack of a formal description of multi-language that does not 
provide neither a method for language designers to conceive new multi-languages 
nor any guarantee on the correctness of such constructions. 


1 Other popular engines that obey the embedded interpreters paradigm are 
Jython [28], JScript [44], and Rhino [13]. 
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The Proposed Framework: Roadmap and Contributions. Matthews and Find- 
ler [30] propose boundary functions as a way to regulate the flow of values 
between languages. They show their approach on different variants of the same 
multi-language obtained by mixing ML [33] and Scheme [9], representing two 
“syntactically sugared” versions of the simply-typed and untyped lambda cal- 
culi, respectively. 

Rather than showing the embedding of two fixed languages, we extend their 
approach to the much broader class of order-sorted algebras [19] with the aim 
of providing a framework that works regardless of the inherent nature of the 
combined languages. There are a number of reasons to choose order-sorted alge- 
bras as the underlying framework for generalizing the multi-language construc- 
tion. From the first formulation of initial algebra semantics [17], the algebraic 
approach to program semantics [16] has become a cornerstone in the theory of 
programming languages [27]. Order-sorted algebras provide a mathematical tool 
for representing formal systems as algebraic structures through a systematic 
use of the notion of sort and subsort to model different forms of polymor- 
phism [18,19], a key aspect when dealing with multi-languages sharing oper- 
ators among the single-languages. They were initially proposed to ensure a 
rigorous model-theoretic semantics for error handling, multiple inheritance, 
retracts, selectors for multiple constructors, polymorphism, and overloading. In 
the years, several uses [3,6, 11,24, 25,38,39,52] and different variants [38,43, 45, 
51] have been proposed for order-sorted algebras, making them a solid start- 
ing point for the development of a new framework. In particular, results on 
rewriting logic [32] extend easily to the order-sorted case [31], thus facilitat- 
ing a future extension of this paper towards the operational semantics world. 
Improvements of the order-sorted algebra framework have also been proposed 
to model languages together with their type systems [10] and to extend order- 
sorted specification with high-order functions [38] (see [48] and [18] for detailed 
surveys). 

In this paper, we propose three different multi-language constructions accord- 
ing to the semantic properties of boundary functions. The first one models a gen- 
eral notion of multi-language that do not require any constraints on boundaries 
(Sect.3). We argue that when such generality is superfluous, we can achieve a 
neater approach where boundary functions do not need to be annotated with 
sorts. Indeed, we show that when the cross-language conversion of a term does 
not depend on the sort at which the term is considered (i.e., when boundaries 
are subsort polymorphic) the framework is powerful enough to apply the correct 
conversion (Sect. 4.1). This last construction is an improvement of the original 
notion of boundaries in [30]. From a practical point of view, it allows program- 
mers to avoid to explicitly deal with sorts when writing code, a non-trivial task 
that could introduce type cast bugs in real world languages. Finally, we provide 
a very specific notion of multi-language where no extra operator is added to the 
syntax (Sect. 4.2). This approach is particularly useful to extend a language in a 
modular fashion and ensuring the backward compatibility with “old” programs. 
For each one of these variants we prove an initiality theorem, which in turn 
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ensures the uniqueness of the multi-language semantics and thereby legitimat- 
ing the proposed framework. Moreover, we show that the framework guarantees a 
fundamental closure property on the construction: The resulting multi-language 
admits an order-sorted representation, i.e., it falls within the same formal model 
of the combined languages. Finally, we model the multi-language designed in [30] 
in order to show an instantiation of the framework (Sect. 6). 


2 Background 


All the algebraic background of the paper is firstly stated in [15,17,19]. We 
briefly introduce here the main definitions and results, and we illustrate them 
on a simple running example. 


Given a set of sorts S, an S-sorted set A is a family of sets indexed by S, i.e., 
A = {A,|s€S}. Similarly, an S-sorted function f: A — B is a family of 
functions f = { fs: As > Bs | s € S}. We stick to the convention of using s and 
w as metavariables for sorts in S and S*, respectively, and we use the blackboard 
bold typeface to indicate a specific sort in S. In addition, if A is an S-sorted set 
and w = s1... Sn E St, we denote by Aw the cartesian product As, X+: X As,- 
Likewise, if f is an S-sorted function and a; E As; for i = 1,...,n, then the 
function fw: Aw —> By is such that fu(ai,...,4n) = (fs, (a1),---,fs,,(@n))- 
Given P C S, the restriction of an S-sorted function f to P is denoted by 
f|p and it is the P-sorted function f|p = { fs | s € P }. Finally, if g: A > B 
is a function, we still use the symbol g to denote the direct image map of g 
(also called the additive lift of g), i.e., the function g: (A) > P(B) such that 
g(X) = {g(a) € B |a € X}. Analogously, if < is a binary relation on a set A 
(with elements a € A), we use the same relation symbol to denote its pointwise 
extension, i.e., we write a...an < aj... ah for ay <aj,...,an < al. 


The basic notions underpinning the order-sorted algebra framework are the def- 
initions of signature, that models symbols forming terms of the language, and 
algebra, that provides an algebraic meaning to symbols. 


Definition 1 (Order-Sorted Signature). An order-sorted signature is a 
triple (S, <, X}, where S is a set of sorts, < is a binary relation on S, and 
X is an S* x S-sorted set X = { Xu, s | w E€ S* As E€ S}, satisfying the following 
conditions: 


(los) (S, <) is a poset; and 
(208) o E€ Siwy 6, ON Fws, s2 aNd wy < We imply sı < s2. 


Ifo € Ly. (or, o: w > s and o: s when w = ¢, as shorthands), we call o an 
operator (symbol) or function symbol, w the arity, s the sort, and (w, s) the rank 
of o; if w = £, we say that o is a constant (symbol). We name < the subsort 
relation and X a signature when (S,<)} is clear from the context. We abuse 
notation and write o € X when ø € Uis Dug: 
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Definition 2 (Order-Sorted Algebra). An order-sorted (S, <, ’)-algebra A 
over an order-sorted signature (S,<, X) is an S-sorted set A of interpretation 
domains (or, carrier sets or semantic domains) A = { A, | s € S}, together with 
interpretation functions [o] 4": Aw > As (or, if w =e, [o]? € As)? for each 
o E€ Lys, such that: 


(loa) s < s implies As C Ag; and 
(20a) o € Xurs N Swa, s2 and wy < wz imply that [o] (a) = fo]? (a) 
for each a E€ Aw. 


An important property of signatures, related to polymorphism, is regularity. Its 
relevance lies in the possibility of linking each term to a unique least sort (see 
Proposition 2.10 in [19]). 


Definition 3 (Regularity of an Order-Sorted Signature). An order-sorted 
signature (S, <, X) is regular if for each o € Xas and for each lower bound 
wo < ù the set { (w, s) | o E€ Luis Awo < w} has minimum. This minimum is 
called least rank of o with respect to wo. 


The freely generated algebra Ty over a given signature G = (S,<, X} provides 
the notion of term with respect to G. 


Definition 4 (Order-Sorted Term Algebra). Let (S,<, X) be an order- 
sorted signature. The order-sorted term (S,<, 5’)-algebra Ts is an order-sorted 
algebra such that: 


- The S-sorted set Ts = { Tss | s E€ S } is inductively defined as the least family 
satisfying: 
(Lot) die,s C Tha; 
(20t) s < s' implies Ty s C Ts s; and 
(30t) o € Eus, W = 51...S5n E St, and ti € Ts s; fori = 1,...,n imply 
o(tı 32 tn) € Pigi 
- For each o € Xw,s the interpretation function lolz: Tsw > Ts, is defined 


as 

(Zot) lo] =0 ifo € Xes; and 

(Sot) Jo] (tir... tn) = OC. tn) if o E Sws, W = S1...Sn E ST, and 
ti E Ts s; fori=1,...,n. 


Homomorphisms between algebras capture the compositionality nature of seman- 
tics: The meaning of a term is determined by the meanings of its constituents. 
They are defined as order-sorted functions that preserve the interpretation of 
operators. 


2 To be pedantic, we should introduce the one-point domain As = {e} and then 
define [o]{j°(e) € As. 
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e z= n | e+e wheren€eN s ṣ:= - |a | s+s wherea€A 
(a) The BNF grammar of Lı. (b) The BNF grammar of L». 


Fig. 1. The BNF grammars of the running example languages. 


l-l =e 
[a] = a 
[s + -]}=[- + s] = [s] 
o [ls + -+ s]=[s + s] 
[e + e] = fe] + [e] [ao +... + an] =a0...an n>0 
(a) The formal semantics of Lı. (b) The formal semantics of L2. 


Fig. 2. The two formal semantics of the running example languages. 


Definition 5 (Order-Sorted Homomorphism). Let A and B be (S, <, X}- 
algebras. An order-sorted (S, <, X)}-homomorphism from A to B, denoted by h: 
A— B, is an S-sorted function h: A> B = { hs: As > Bs | s € S} such that: 


(1oh) hs(o]4" (a) = [o] (hw(a)) for each o € Su, and a € Aw; and 
(20h) s < s implies hs(a) = hs (a) for each a € As. 


The class of all the order-sorted (S, <, X}-algebras and the class of all order- 
sorted (S, <, 5')-homomorphisms form a category denote by OSAlg(S, <, X). 
Furthermore, the homomorphism definition determines the property of the term 
algebra Ts of being an initial object in its category whenever the signature is 
regular. Since initiality is preserved by isomorphisms, it allows to identify Ty 
with the abstract syntax of the language. If Ts is initial, the homomorphism 
leaving Tx and going to an algebra A is called the semantic function (with 
respect to A). 


Example. Let Lı and Lə be two formal languages (see Fig. 1). The former is a 
language to construct simple mathematical expressions: n € N is the metavari- 
able for natural numbers, while e inductively generates all the possible additions 
(Fig. la). The latter is a language to build strings over a finite alphabet of sym- 
bols A = {a,b,...,z}: a € A is the metavariable for atoms (or, characters), 
whereas s concatenates them into strings (Fig. 1b). A term in L; and La denotes 
an element in the sets N and A*, accordingly to equations in Fig.2a and b, 
respectively. 

The syntax of the language Lı can be modeled by an order-sorted signature 
6, = (S1, <1, X1) defined as follows: Sı = {e,n}, a set with sorts e (stands for 
expressions) and n (stands for natural numbers); <; is the reflexive relation on 
Sı plus n <; e (natural numbers are expressions); and the operators in X; are 
0,1,2,...: n and +: ee > e. Similarly, the signature Gg = (S2, <2, X2) models 
the syntax of the language Lə: the set S2 = {s,a} carries the sort for strings 
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s and the sort for atomic symbols (or, characters) a; the subsort relation <2 is 
the reflexive relation on S2 plus a <2 s (characters are one-symbol strings); and 
the operator symbols in X% are a,...,Z: O, -: s, and +: ss — s. Semantics of L 
and Lə can be embodied by algebras A, and Az over the signatures G; and Ge, 
respectively. We set the interpretation domains of A; to Al = Al = N and those 
of Az to AŽ = A C A* = A?. Moreover, we define the interpretation functions 
as follows (the juxtaposition of two or more strings denotes their concatenation, 
and we use â as metavariable ranging over A*): 


Es 
en PIi =E 
lJi =M [ j a 
aja, = a 
[+ (n, n2) = Ni + Nə Bes aA oA _ AA 
[+] 4; (G1, G2) = Grae 
Since G; and Gg are regular, then A; and Az induce the semantic functions 
hy: Ts, > Ai and h2: Ts, — A2, providing semantics to the languages. 


3 Combining Order-Sorted Theories 


The first step towards a multi-language specification is the choice of which terms 
of one language can be employed in the others [30,35,36]. For instance, a multi- 
language requirement could demand to use ML expressions in place of Scheme 
expressions and, possibly, but not necessarily, vice versa (such a multi-language is 
designed in [30]). A multi-language signature is an amenable formalism to specify 
the compatibility relation between syntactic categories across two languages. 


Definition 6 (Multi-Language Signature). A multi-language signature is a 
triple (61, G2, <), where G, = (S1, <1, X1) and Gz = (S2, <2, X2) are order- 
sorted signatures, and < is a binary relation on S = Sı U S2, such that satisfies 
the following condition: 


(1s) s,s’ € Si implies s < s' if and only if s <i 8’, fori = 1,2. 


To make the notation lighter, we introduce the following binary relations on S: 
sx s’ ifs < s' but neither s <1 8’ nor s <os', ands 3 S' ifs < s’ but not sKs’. 


In the following, we always assume that the sets of sorts Sı and S2 of the order- 
sorted signatures G; and Gg are disjoint.? Condition (1s) requires the multi- 
language subsort relation < to preserve the original subsort relations <; and <2 
(ie, < N S; x S; = <;). The join relation x provides a compatibility relation 
between sorts* in 6; and 62. More precisely, S; Ə s x s’ € S} suggests that we 
want to use terms in 7’y, , in place of terms in Ts;,s, whereas the intra-language 


3 This hypothesis is non-restrictive: We can always perform a renaming of the sorts. 

4 Sorts may be understood as syntactic categories, in the sense of formal grammars. 
Given a context-free grammar G, it is possible to define a many-sorted signature Xa 
where non-terminals become sorts and such that each term t in the term algebra 
Tsg is isomorphic to the parse tree of t with respect to G (see [15] for details). 
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subsort relation x shifts the standard notion of subsort from the order-sorted to 
the multi-language world. In a nutshell, the relation < = x U x can only join 
(through x) the underlying languages without introducing distortions (indeed, 
<= <, U<p). 

The role of an algebra is to provide an interpretation domain for each sort, 
as well as the meaning of every operator symbol in a given signature. When 
moving towards the multi-language context, the join relation x may add subsort 
constraints between sorts belonging to different signatures. Consequently, if sx s’, 
a multi-language algebra has to specify how values of sort s may be interpreted 
as values of sort s’. These specifications are called boundary functions [30] and 
provide an algebraic meaning to the subsort constraints added by x. Henceforth, 
we define S = S1 U S2, X = X1 U X2, and, given (w,s) € S* x Si, we denote by 
Xi, s the (w, s)-sorted component in X;. 


Definition 7 (Multi-Language Algebra). Let (G1,G2,<) be a multi- 
language signature. A multi-language (G,,G2,<)-algebra A is an S-sorted 
set A of interpretation domains (or, carrier sets or semantic domains) A = 
{As |sE€ S}, together with interpretation functions [o]‘y°: Aw —> As for 
each o € Xw s, and with a x-sorted set a of boundary functions a = 
{as s: As > Ag | sx 5’ }, such that the following constraint holds: 


(1a) the projected algebra A;, where i = 1,2, specified by the carrier set Aj = 
{ Al = As | s € Si} and interpretation functions [ol] = [o] 4" for each 
ae x 


w,s? 


must be an order-sorted 6;-algebra. 


If M is an algebra, we adopt the convention of denoting by M (standard math 
font) its carrier set and by u (Greek math font) its boundary functions whenever 
possible. Condition (la) is the semantic counterpart of condition (1s): It requires 
the multi-language to carry (i.e., preserve) the underlying languages order-sorted 
algebras, whereas the boundary functions model how values can flow between 
languages. 

Given two multi-language (G1, G2, <)-algebras A and B we can define mor- 
phisms between them that preserve the sorted structure of the underlying pro- 
jected algebras. 


Definition 8 (Multi-Language Homomorphism). Let A and B be multi- 
language (61, G2, <)-algebras with sets of boundary functions a and 3, respec- 
tively. A multi-language (G,,G2,<)-homomorphism h: A — B is an S-sorted 
function h: A — B such that: 


1h) the restriction h| is an order-sorted G;-homomorphism h| o : Ai > Bi, 
Si Si 
fori =1,2; and 
(2h) s x s” implies hg 0 @s s = Bs s © hs. 


Conditions (1h) and (2h) are easily intelligible when the domain algebra is the 
abstract syntax of the language [15]: Simply put, both conditions require the 
semantics of a term to be a function of the meaning of its subterms, in the sense 
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of [15,46]. In particular, the second condition demands that boundary functions 
act as operators.” 

The identity homomorphism on a multi-language algebra A is denoted by 
id, and it is the set-theoretic identity on the carrier set A of the algebra A. 
The composition of two homomorphisms f: A— B and g: B — C is defined as 
the sorted function composition go f: A — C, thus idyof = f = f oidg and 
associativity follows easily by the definition of o. 


Proposition 1. Multi-language homomorphisms are closed under composition. 


Hence, as in the many-sorted and order-sorted case [15,19], we have immediately 
the category of all the multi-language algebras over a multi-language signature: 


Theorem 1. Let (G,,62,<) be a multi-language signature. The class of all 
(G1, G2, <)-algebras and the class of all (G1, G2, <)-homomorphisms form a cat- 
egory denoted by Alg(Gi, 62, <). 


3.1 The Initial Term Model 


In this section, we introduce the concepts of (multi-language) term and 
(multi-language) semantics in order to show how a multi-language algebra 
yields a unique interpretation for any regular (see Definition 11) multi-language 
specification. 


Multi-language terms should comprise all of the underlying languages terms, plus 
those obtained by the merging of the two languages according to the join relation 
x. In particular, we aim for a construction where subterms of sort s’ may have 
been replaced by terms of sort s, whenever s x s’ (we recall that s and s’ are two 
syntactic categories of different languages due to Definition 6). Nonetheless, we 
must be careful not to add ambiguities during this process: A term t may belong 
to both G; and Gə term algebras but with different meanings [¢],, and [¢] ,, 
(assuming that A; and A, are algebras over ©; and Gg, respectively). When t is 
included in the multi-language, we lose the information to determine which one 
of the two interpretations choose, thus making the (multi-language) semantics of 
t ambiguous. The same problem arises whenever an operator ø belongs to both 
languages with different interpretation functions. The simplest solution to avoid 
such issues is to add syntactical notations to make explicit the context of the 
language in which we are operating. 


Definition 9 (Associated Signature). The associated signature to the multi- 
language signature (G1, G2, <) is the ordered triple (S, x, II), where S = SU S2, 
x= <1 U <2, and 
W={o:wos|o:wo>seE5\} 
U{og:wos|o:woseE Xp} 
U{g iss’ |s xs} 


5 This is essential in order to generalize the concept of syntactical boundary functions 
of [30] to semantic-only functions in Sect. 4.2. 
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It is trivial to prove that an associated signature is indeed an order-sorted sig- 
nature, thus admitting a term algebra 7y. All the symbols forming terms in Ty 
carry the source language information as a subscript, and all the new opera- 
tors =s, specify when a term of sort s is used in place of a term of sort s’. 
Although Ty seems a suitable definition for multi-language terms, it is not a 
multi-language algebra according to Definition 7. However, we can exploit the 
construction of Ty in order to provide a fully-fledged multi-language algebra 
able to generate multi-language terms. 


Definition 10 (Multi-Language Term Algebra). The multi-language term 
algebra T over a multi-language signature (G1, 2, <) with boundary functions 
T is defined as follows: 


(1t) s € S implies Ts = Trs; 

(2t) o € Xi, s implies [o] 7” = loi) 7? for i= 1,2; and 

(3t) s x s! implies Ts s = [> s,s T 

Proving that 7 satisfies Definition 7 is easy and omitted. J and 7y share the 
same carrier sets (condition (1t)), and each single-language operator o € X%, , is 
interpreted as its annotated version g; in Ty (condition (2t)). Furthermore, the 
multi-language operators —s,s no longer belong to the signature (they do not 
belong neither to ©, nor to G2) but their semantics is inherited by the boundary 
functions T (condition (3t)), while their syntactic values are still in the carrier 
sets of the algebra (this construction is highly technical and very similar to the 
freely generated X(X)-algebra over a set of variables X, see [15]). 

Note that this is exactly the formalization of the ad hoc multi-language 
specifications in [2,30,36,37]: [2,36,37] exploit distinct colors to disambiguate 
the source language of the operators, whereas [30] use different font styles for 
different languages. Moreover, boundary functions in [30] conceptually match 
the introduced operators — s,s’. 


The last step in order to finalize the framework is to provide semantics for each 
term in 7. As with the order-sorted case, we need a notion of regularity for 
proving the initiality of the term algebra in its category, which in turn ensures 
a single eligible (initial algebra) semantics. 


Definition 11 (Regularity). A multi-language signature (61, G2, <) is regu- 
lar if its associated signature (S, 3, IT) is regular. 


Proposition 2. The associated signature (S, 3, II) of a multi-language signa- 
ture (G1, Ga, <) is regular if and only if G1 and G2 are regular. 


The last proposition enables to avoid checking the multi-language regularity 
whenever the regularity of the order-sorted signatures is known. 


Theorem 2 (Initiality of T). The multi-language term algebra T over a reg- 
ular multi-language signature (S1, ©2, <) is initial in the category Alg(Gi, 
62, <) i 
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Initiality of 7 is essential to assign a unique mathematical meaning to each 
term, as in the order-sorted case: Given a multi-language algebra A, there is 
only one way of interpreting each term t € T in A (satisfying the homomorphism 
conditions). 


Definition 12 ((Multi-Language) Semantics). Let A be a multi-language 
algebra over a regular multi-language signature (6 ,,G2,<). The (multi- 
language) semantics of a (multi-language) term t € T induced by A is defined as 


Ila = his) (t) 


The last equation is well-defined since A is the unique multi-language homomor- 
phism h: T — A and for each t € T there exists a least sort ls(t) € S such that 
t € Tris) (see Prop. 2.10 in [19]). 


Example. Suppose we are interested in a multi-language over the signatures 
6, and Gg specified in the example given in the background section such that 
satisfies the following properties: 


— Terms denoting natural numbers can be used in place of characters a € A 
according to the function chr: N — A that maps the natural number n to the 
character symbol al” ™04 IAI) (we are assuming a total lexicographical order 
a) oa I=) on A); 

— Terms denoting strings can be used in place of natural numbers n € N accord- 
ing to the function ord: A — N, which is the inverse of chr restricted the initial 
segment on natural numbers N <4). 


In order to achieve such a multi-language specification, we can simply provide 
a join relation x on S and a boundary function Qs, for each extra-language 
subsort relation s x s’ introduced by x. We define the join relation and the 
boundary functions as follows: 


exa A nxa — Qealn) = an a(n) = chr(n) 


Qo,nla) = ord (a) 


sx N A axn — k 
Qs nla cân) -5 Qo,nlak) -10 


The multi-language (61, G2, <)-algebra A can now be obtained by joining the 
projected algebras A; and A2 with the set of boundary functions a. The term 
algebra T over (G1, G2, <) provides all the multi-language terms, and Theorem 2 
ensures a unique denotation of each t € T in A. For instance, the term 


t2 


ke aaa 
— 
t= an(+2(f2; +2(02, e,a(+1 (101, 51))))) (1) 
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is syntactically equivalent to the following but with a less pedantic notation, 
where language subscripts are replaced by colors (red for one, and blue for two) 
and prefix notation is replaced by infix notation 


>s nlf +o+t —e,a(10 + 5)) 


and it denotes the natural numbers 765: 


[tala = Mis(ta) (ta) = he(ta) = HEO La) = PID 0, i =15 
[ts] a = hss) (ta) = halts) = [Pe oly’ dta] a) = [e14 15) = 
[to] 4 = Msta) (t2) = hs(t2) = HEE (Lola. [ts] 4) = HE (0, 0) = 00 
[tila = hst) = helti) = PRE ENa [ee] a) = Mae, 00) = foo 
Ila = hist) (t ) = ha(t) = [sn]? A rala) = = [s, n vi (£00) = = 765 

(see the proof of Prop. 2.10 in [19] to check how to compute the least sort of a 


term). 


4 Refining the Construction 


The construction in Sect.3 does not set any constraint on boundary functions, 
thus giving a great deal of flexibility to language designers. For instance, they 
can provide boundary functions that act differently with respect to the intra- 
language subsort relation <: According to the previous example, it would have 
been possible to define an a Æ Qe,o to employ different value conversion specifica- 
tions for terms in Tn, based on whether they are used as natural numbers (n) or 
as expressions (e). However, when this amount of flexibility is not needed, we can 
refine the previous construction by reducing the amount of syntax introduced 
by the associated signature. In this section we examine 


— the case where boundary functions satisfy the monotonicity conditions of 
order-sorted algebra operators (Sect. 4.1); and 

— the case where boundary functions commutes with the semantics of operator 
symbols (Sect. 4.2). 


In both cases, we prove that the introduced refinements do not affect the initiality 
of the term algebra, thereby providing unambiguous semantics to the multi- 
language. 


4.1 Subsort Polymorphic Boundary Functions 


In Sect. 3, the join relation constraints s x s’ are turned in syntactical operators 
<>, s in the associated signature (S,<, l). We now show how to handle all 
the syntactical overhead introduced by x with a single polymorphic operator 
< whenever the boundary functions satisfy the monotonicity conditions of the 
order-sorted algebras [19]. Such conditions require a subsort relation sı < s2 
between the sorts of a polymorphic operator o € Xw,,sı N Xwz,s2, assuming that 
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wı < w2. In our case, o =—, and thus we extend Definition 6 with the following 
ad hoc constraint (2s*): 


Definition 6* (SP Multi-Language Signature). A subsort polymorphic 
(SP) multi-language signature is a multi-language signature (G1,62,<) such 
that 


(25*) s1 X si, S2 X sh, and sı = s2 imply s) x s4. 


Furthermore, order-sorted algebras demand consistency of the interpretation 
functions of a subsort polymorphic operator on the smaller domain, which 
results in the following condition (2a*) on boundary functions (that extends 
Definition 7): 


Definition 7* (SP Multi-Language Algebra). Let (61, G2, <) be a SP multi- 
language signature. A subsort polymorphic (SP) multi-language (G1, G2, <)- 
algebra is a multi-language (G1, G2, <)-algebra A such that 


(2a*) s1 X 84, 82 X 85, and sı = s2 imply that as, s(a) = Qs,,s, (a) for each 
a € Ås. 


The notion of homomorphism in this new context does not change (an homo- 
morphism between two SP algebras is still an S-sorted function decomposable 
in two order-sorted homomorphisms that commutes with boundaries), whereas 
the associated signature to an SP multi-language signature merely differs from 
Definition 9 for having a unique polymorphic operator — instead of a family of 
parametrized symbols { 5,5: s > s’| sx s}. 


Definition 9* (SP Associated Signature). The subsort polymorphic (SP) 
associated signature to the SP multi-language signature (G1,G2,<) is the 
ordered triple (S, 3, IT), where S = S1 U S2, 3 = <1 U <2, and 


WT={o:wos|o:wo>seE X} 
U{og:w>s|o:woseE Xp} 
U{s:sos'|sxs'} 


Since the associated signature is the basis for the term algebra, we need to modify 
the condition (3t) in Definition 9: 


Definition 10* (SP Multi-Language Term Algebra). The subsort 
polymorphic (SP) multi-language term algebra T over a SP multi-language sig- 
nature (61,62, <) with boundary functions T is defined as follows: 


(1t) s € S implies Ts = Ty, 5; 
(2t) o € X, implies [o] 7” = [oi]. for i = 1,2; and 


w,s 


(3t) s x s! implies Ts s = [S12 


Signature regularity is still defined as in Definition 11 and Proposition 2 still 
holds for the extended version developed in this section. As a result, the 
SP multi-language term (G1, G2,<)-algebra T is still initial in the category 


Alg* (G61, G2, <) of SP multi-language algebras over the SP multi-language sig- 
nature (G1, Go, <). 
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Theorem 3. Let (G1, G2,<) be a SP multi-language signature. The class of all 


SP (S1, G2, <)-algebras and the class of all (G1, G2, <)-homomorphisms form a 
category denoted by Alg* (G1, Go, <). 


Theorem 4 (Initiality of T). The SP multi-language term algebra T over 
a regular SP multi-language signature (6,,G2,<) is initial in the category 
Alg*(Gi, 62, <). 


The semantics of a term t induced by a SP multi-language algebra A is defined 
in the same way of Definition 12, thanks to the initiality result: [t] 4 = hist) (t). 
The main advantage of dealing with SP multi-language terms is that the frame- 
work is able to determine the correct interpretation function of the operator 
—, making the subscript notation developed in the previous section superflu- 
ous. This also means that programmers are exempted from explicitly annotating 
multi-language programs with sorts, a non-trivial task in the general case that 
could introduce type cast bugs. 


Example. The boundary functions of the previous example are subsort poly- 
morphic: Qo,n(a) = ord(a) = as n(a) for each character a € A, and Qn,a = Qe,a by 
definition. Thus, the equivalent of the term t (see Eq. 1) in the SP term algebra is 


i = +(+2(f2, +2(02, (+1 (101, 51))))) (2) 
or, according to the previous notation, 
—(f + o + (10 + 5)) 


and denoting the same natural number 765. 


4.2 Semantic-Only Boundary Functions 


In the previous section, we have shown how to handle the flow of values across 
different languages with a single polymorphic operator. Now, we present a new 
multi-language construction where neither extra operators are added to the asso- 
ciated signature, nor single-language operators have to be annotated with sub- 
scripts indicating their original language. Thus, the resulting multi-language 
syntax comprises only symbols in X1 U X2. Such a construction is achieved by: 


— Imposing commutativity conditions on algebras, making homomorphisms 
transparently inherit the semantics of boundary functions. The framework 
is therefore able to apply the correct value conversion function whenever is 
necessary, without the need for an explicit syntactical operator —. 

— Requiring a new form of cross-language polymorphism able to cope with 
shared operators among languages. The initiality of term algebras is pre- 
served by modifying the notion of signature in a way that every operator 
admits a least sort. 
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The variant of the framework presented in this section is particularly useful 
when designing the extension of a language in a modular fashion. For instance, 
if the signature G, models the syntax of a simple functional language (for an 
example, see [15, p. 77]) without an explicit encoding for string values, and G2 
is a language for manipulating strings (similar to the language Lə of the running 
example of this paper), we can exploit the construction presented below in order 
to embed Gg into 64. 


Signature. The main issue that can arise at this stage of multi-language signa- 
ture is the presence of shared operators in X and X2. Contrary to the previous 
cases where such ambiguity is solved by adding subscripts in the associated sig- 
nature, the trade off here is requiring ad hoc or subsort polymorphism across 
signatures. 


Definition 6* (SO Multi-Language Signature). A semantic-only (SO) 
multi-language signature is a multi-language signature (G1, ©2, <) such that 


(2s*) (S, <) is a poset; and 
(3>) o E St, NS and wy X w2 imply sı X s2 with i, j = 1,2 and i £ j. 


w1,S1 W2,82 


Condition (2s*) forces the subsort relation to be directed, avoiding sym- 
metricity of syntactic categories (this is typical when modeling language exten- 
sions), while condition (3s*) shifts the monotonicity condition of order-sorted 
signature to syntactically equal operators in %1 N 7. 

The associated signature is defined without adding extra symbols in the signa- 
ture, i.e., H = Xı U X2, and deliberately confounding the relations x and x 
in <: 


Definition 9* (SO Associated Signature). The SO associated signature to 
the SO multi-language signature (G1, G2, <) is the ordered triple (S, <, IT), where 
S = S1 U S2, < = 3 U x, and M = X1 U J. 


The embedding of x in < (i.e., x C <) in the associated signature enables the 
order-sorted term algebra construction to automatically build multi-language 
terms, without the need for an explicit operator —> that acts as a bridge between 
syntactic categories. It is easy to see that the term algebra over the associated 
signature is precisely the symbols-free version of multi-language described at the 
beginning. 

Unfortunately, multi-language regularity does not follow anymore from 
single-languages regularity and vice versa (see Figs.3 and 4)°. More formally, 
Proposition 2 does not hold in this new context: 


6 An (horizontal) arrow from an arity symbol w to a sort s labelled with an operator 
symbol ø is an alternative shorthand for ø: w — s. A (vertical) single line between 
two sorts s below s’ labelled with a binary relation < means that s < s’ (if the 
binary relation is the join relation x the line is doubled). A dotted rectangle around 
operators is a graphical representation of the set of ranks (w,s) that must have a 
minimum element (red arrows) in order for the signature to be regular. 
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(b) The Hasse-like diagram of the 
(a) The Hasse-like diagrams of regular non-regular multi-language signature 
signatures ©; (left) and G2 (right). (G1, Ga, <). 


Fig. 3. A non-regular multi-language signature comprising two regular order-sorted 


signatures. 
oE Se n 
| 
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(Sp <1) 1 (S2 <2) 
(b) The Hasse-like diagram of 
(a) The Hasse-like diagrams of signatures 61 the regular multi-language sig- 
(non-regular, left) and G2 (regular, right). nature (61, 62, <). 


Fig. 4. A regular multi-language signature comprising a non-regular order-sorted 
signature. 


Suppose Sı = {%8,5}, So = {wo,w,s}, <i and <2 to be the reflexive 
relations on Sı and S2, respectively, plus wo <2 w, and o € X}; N0 Ze. 
If the join relation x is defined as wọ x w ands x š, the resulting 
associated signature is no longer regular, although G,; and Gg are regular 
(Fig. 3a). In Fig. 3b, it is easy to see that o € Xg, and wo < w, but the set 
{(w,s) | o € Sws A wo < w} = { (%9,5), (w,3) } does not have a least element 
w.r.t. Wo. 

On the other hand, let Sı = {W,wo,w1,5}, So = {we,32}, <1 and <2 
be the reflexive relations on Sı and S2, respectively, plus wo <1 w and 
wo <1 wi, and o € X}; N ZT) 3 ZZ... If the join relation x is defined 
aS W2 K W,W2 X W1, Wo X Wo, and s2 x S, the resulting associated signa- 
ture is regular (Fig. 4a), although G; is not: given o € ys and wo < w, the 


set {(w,s) | a € Lys A wo < w} = { (W,8), (wi,3), (W2, 52) } has least element 
(w2,S2) w.r.t. wo (Fig. 4b). 
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A positive result can be obtained by recalling that regularity is easier to check 
when (5, <) satisfies the descending chain condition (DCC): 


Lemma 1 (Regularity over DCC poset [19]). An order-sorted signature X 
over a DCC poset (S,<) is regular if and only if whenever o € Ly, 5, Lwo,s9 
and there is some wo < w1, w2, then there is some w < w1, w2 such that o E€ Xw,s 
and wo < w. 


At this point, we can relate the DCC of the poset (S, <) in the associated signa- 
ture of (61, 62, <) to the DCC of (S1, <1) and (S2, <a): 


Proposition 3. Let (S, <, X) be the associated signature of (S1, 2, <). Then, 
(S, <} is DCC if and only if (S1, <1) and (S2, <2) are DCC. 


As a result, whenever we know that (S1, <1) and (S2, <2) are DCC, we can 
check the regularity of (61, G2, <) by employing the Lemma 1 without checking 
whether (S,<) is DCC. 


Algebra. In this multi-language construction, the boundary functions 
behaviour is no more bounded to syntactical operators as in the previous sec- 
tions, but it is inherited by homomorphisms. A necessary condition to accom- 
plish this aim is the commutativity of interpretation functions with boundary 
functions: 


Definition 7* (SO Multi-Language Algebra). Let (G,,G2,<) be an 
SO multi-language signature. A semantic-only (SO) multi-language (G1, G2, <)- 
algebra is an SP multi-language (G1, G2, <)-algebra A such that 


W1,81 


(8a*) o € Ywys, N Lwo,so and wy x w2 imply that as,,s.([o]4'"'(@)) = 
loJa (aw, w(a)) for each a € Aw. 


Note that o € Diy, 6, N Xw,s2 and w1 X we imply sı X s2 by condition (3s*). The 
notion of homomorphism remains unchanged from Definition 8 (to understand 
how the homomorphisms inherit the boundary functions behaviour, see the proof 
of Theorem 6). 

The term algebra is defined similarly to Definition 10, except for boundary 
functions: 


Definition 10* (SO Multi-Language Term Algebra). The semantic-only 
(SO) multi-language term algebra T over an SO multi-language signature 
(61, Go, <) with boundary functions T is defined as follows: 


(At) s € S implies Ts = Trs; 
(2t*) o € Xw, implies [o] 7° = fo] 7"; and 
(3t) sx s implies Ts s = idr,. 


Since the subsort relation < includes the join relation x, sx s’ implies Trr,s = 
Ts C Ty = Tys. Thus, the boundary function Ts, can be defined as the 
identity on the smaller domain (note that it trivially satisfies the commutativity 
condition (3a*)). 
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Proposition 4. Let (G1,G62,<) be an SO multi-language signature. Then, 
the SO multi-language term (G1, G2, <)-algebra is a proper SO multi-language 
algebra. 


Theorem 5. Let (61,2, <) be a SO multi-language signature. The class of all 
SO (61, Go, <)-algebras and the class of all (G1, G2, <)-homomorphisms form a 


category denoted by Alg* (61, Ga, <). 
We can now prove the initiality of 7 in its category. 


Theorem 6 (Initiality of T). Let (G1,G62,<) be a regular multi-language 
signature. Then, the term algebra T is an initial object in the category 
Alg(G1, Go, <). 


Thanks to the initiality of the term algebra, the definition of term semantics is 
the same of Definition 12. 


Example. Let A; and A> be two order-sorted algebras over the signatures 
G; and Gg, respectively, as formalized in the example in Sect. 3. Suppose we are 
interested in a new multi-language A over ©; and G2 such that any string expres- 
sions t of sort s in G2 can denote the natural number length([¢] ,, ) when embed- 
ded in G, terms. For instance, we require that [10 + 5], = [10 + 5],, = 15 
and [f + o], = [£ + o],, = fo, but [(£ + o) + (10 + 5)J, = [fo + 15], = 
17 (parentheses in the last term have only been used to disambiguate the parsing 
result). 

Since the requirements demand to use string expressions in place of natural 
numbers, the join relation x shall define s x n and ensure transitivity, hence 
Sxe oxn, and ax e. 

The signatures G; and Gə are trivially regular. However, by merging 6, 
and Gg, we are causing subsort polymorphism on the symbol +, which is used 
as sum operator in A; and as concatenation operator in A2, and therefore we 
have to check the regularity: Let wı = ee, wz = $3, Sı = €, and s2 = s. Given 
+ € Mw .5, N Xwz,s2 and the lower bound wo = aa < w1, w2, then there exists 
w = ss such that w < w1,w2 and + E€ Xw,s, where s = s < 81,52 (we have 
employed Lemma 1 thanks to Proposition 3). Analogously, when wo = w1, w2 
the relative least rank is (33,3). 

The multi-language (G1, G2, <)-algebra A is now defined by joining the pro- 
jected algebras A; and A2 and by defining boundary functions as,s for each 
s x s’ such that convert strings in naturals (their length) when strings are used 
in place of naturals: 


da n(a) = daela) = 1 as n(â) = ds,¢(@) = length(a) 


The above definition of boundary functions satisfy both conditions (2a*) 
and (3a*). 
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The initiality theorem yields the semantic homomorphism from 7 to A. For 
instance, suppose we want to compute the semantics of the term 


— N 

t = +(+(£, 0), +(10,5)) 
t 
1 


The least sorts of t, tı, and tz are e, s, and e, respectively. The operator + belongs 
to both Nee, and Xss, and its least rank w.r.t. the lower bound Is(t1) ls(t2) = se 
is (ee, e). By Definition 12 we have 


[tla = helt) = HA (helti), he(t2)) 


At this point, since ls(t1) = s and ls(f) = Is(o) = a, then the least rank of the 
root symbol + of tı w.r.t. the lower bound ls(f)ls(o) = aa is (ss,s), thus 


he(ti) = as,e(hs(t1)) = ds,e([+] 54” (hs(£), hs(o))) = ds,e([+] 54° (£, 0)) = @s,e(fo) = 2 


Similarly, ls(t2) = e and 1s(10) = 1s(5) = n. Then, the least rank of the root 
symbol + of tz w.r.t. the lower bound (n,n) is (ee, e) and therefore we have 


he(t2) = [HIZ (hn (10), hn (5)) = [T° (10, 5) = 15 
Finally, 


[tla = helt) = HIES (helti), helt2)) = HIEST (2, 15) = 17 


as desired. 
We can observe that without any syntactical operator the framework is still 
able to apply the correct boundary functions to move values across languages. 


5 Reduction to Order-Sorted Algebra 


The constructions in the previous sections beg the question whether a multi- 
language algebra admits an equivalent order-sorted representation. Conceptually, 
it would mean that being a multi-language is essentially a matter of perspective: 
By forgetting how the multi-language has been constructed, what is left is simply 
an ordinary language. Mathematically speaking, it requires us to exhibit a reduc- 
tion functor F from the multi-language category to an order-sorted one, such 
that there is an isomorphism ¢ between the carrier sets of the multi-language 
term (G1, 62, <)-algebra 7 and F(T), and such that [t] 4 = [¢(¢)] p(s) for each 
t € T and for each multi-language (G1, G2, <)-algebra A. 

In the following, we denote the reduction functor by F, F*, and F* accord- 
ingly whether its domain is the category Alg(G1, Ge, <), Alg* (61, Ge, <), and 
Alg* (61, G2, <), respectively. 

In the case of Alg(G,, Gz, <) and Alg* (61, G2, <) categories, the construc- 
tion of F and F* is very simple, and we illustrate it only for the plain multi- 
language algebras of Sect.3: Let A be a multi-language (G1, G2, <)-algebra. 
Then, we define the order-sorted (S, =, I7)-algebra Ay (called the associated 
order-sorted algebra of A) by setting 
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(17) Ans = As for each s € S; 
(27) foi)” = Jo] 7° for each o € Xi, and i = 1,2; and 


(37) [es] = Qs, s for each s x s. 


If A and B are multi-language (61, G2, <)-algebras, and h is a multi-language 
(61, G2, <)-homomorphism from A to B, the functor F maps A and B to their 
associated order-sorted algebras Aj and By and the homomorphism h to itself. 
Since Ay = A, the isomorphism ¢ is the identity function. 


Theorem 7. F: Alg(G1, G2, <) > OSAlg(G1, G2, <) is a functor for every 


multi-language signature (G1, 2, <). Moreover, [t]_, = [t] ria) for each t € T 
and for each multi-language (G1, G2, <)-algebra A. 


If A is an SP multi-language (G1, G2, <)-algebra, the construction of the reduc- 
tion functor F* is similar to the definition of F. The only difference is the 
equation in the condition (37) that turns into 


(37*) ee = Qs y for each s x s’. 


Finally, the definition of F* starting from the category Alg*(61, 62, <) of 
SO multi-language algebras is slightly different. We define F* as a map 
from the multi-language category Alg* (61, 62, <) to the order-sorted category 
OSAlg(S, <, X). We denote the reduction of a multi-language algebra A and 
a homomorphism h: A — B as F(A) = A, and F(h) = hi: Ay — B,. The 
order-sorted algebra A, has the same carrier sets of the multi-language algebra 
A, i.e., A, = A, and interpretation functions [o]% = [o]7°. Furthermore, we 
define h, = h. Intuitively, the algebra A, is formally defined simply by forgetting 
about the boundary functions, while the homomorphism h,: A, — B, inherits 
their semantics from h. Again, the isomorphism ¢ is the identity. 


Theorem 8. F*: Alg*(G1,G2,<) > OSAlg(S, x, X) is a functor for every 
SO multi-language signature (G1,G2,<). Moreover, [t]_, = [t] p(s) for each 


tET and for each SO multi-language (G1, G2, <)-algebra A. 


Unfortunately, even though T is an initial algebra in its category, F*(T) = TJ, is 
not: Given two multi-language algebras A and A’ that differ only in the boundary 
functions (we denote by a and a’ the families of boundary functions of A and 
A’, respectively) they both get mapped by F* to the same order-sorted algebra 
A,. Thus, if h: T — A and h’: T — A’ are the unique homomorphisms going 
from T to A and A’, the functor F maps them to two different order-sorted 
homomorphisms h: 7, — A, and h|: T, — A, both leaving 7, and going to 
A,, hence losing the uniqueness property. However, this does not pose a problem 
once fixed a family of boundary functions: 


Theorem 9. Let T be the multi-language term (61,62, <)-algebra and A be 
an order-sorted (S, 3, 5’)-algebra. Given a family of boundary functions a = 
{ass |s x s’} such that satisfies condition (3a*), there exists a unique order- 
sorted (S, =<, 5')-homomorphism h*: T, — A commuting with a, i.e., ifs x s’, 
then hX (t) = ass (ho (t)) for each t € Ts. 
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The reduction theorems presented in this section have a strong consequence: 
all the already known results for the order-sorted algebras can be lifted to the 
multi-language world. 


6 An Example of Multi-Language Construction 


The first theoretical paper addressing the problem of multi-language construc- 
tion is [30]. The authors study the so-called natural embedding (a more realistic 
improvement of the lump embedding [7,30,34,40]), in which Scheme terms can 
be converted to equivalent ML terms, and vice versa.’ The novelty in their app- 
roach is how they succeed to define boundaries in order to translate values from 
Scheme to ML. Indeed, the latter does not admit an equivalent representation 
for each Scheme function. Their solution is to “represent a Scheme procedure 
in ML at type T > T2 by a new procedure that takes an argument of type 71, 
converts it to a Scheme equivalent, runs the original Scheme procedure on that 
value, and then converts the result back to ML at type T2”. 

Our goal here is not to discuss a fully explained presentation of ML and 
Scheme languages in the form of order-sorted algebras, but rather to show how 
we can model the natural embedding construction in our framework. Doing so, 
we provide a sketchy formalization of Scheme and ML syntax and semantics, 
and we redirect the reader to [30] for all the languages details. 

To provide the semantics of Scheme, we follow the same approach of Goguen 
et al. [15] where the denotational semantics of the simple applicative language 
(SAL) introduced by Reynolds [42] is given by means of an algebra, exploiting 
the initiality theorem. Such a language is a “syntactically sugared” version of 
the untyped lambda calculus with the fixpoint operator, which in turn is very 
similar to Scheme. 

Let X = {x1,X2,...} be a set of variables and N° be the naturals lattice 
with T and L adjoined. From [46], there exists a complete lattice V such that 
satisfies the isomorphism ¢: V = N° + V o> V, where + is the disjoint union 
with minimum and maximum elements identified, and V o> V is the complete 
lattice of Scott-continuous functions from V to V. Given £ € { N°, V o> V }, we 
define the injections je: € + N°? +V o> V and ig = ¢71 0 je, and the projection 
te: V — E such that me(v) = (¢(v) € € ? (v) L). The set of all Scheme 
environments is the lattice of all total functions P = X — V with componentwise 
ordering p E p’ if and only if p(x) E p'(x) in V for all x € X. Furthermore, we 
define auxiliary functions (see [15] for a more detailed explanation) in order to 
provide the semantics of the language (in the following, z € X and n € N°): 


— get,: PV, get,(p) = p(x) (evaluation function); 
— val: P > V, val (p) = n (n-constant function); 


T To be specific, the authors combine “an extended model of the untyped call-by-value 
lambda calculus, which is used as a stand-in for Scheme, and an extended model of 
the simply-typed lambda calculus, which is used as a stand-in for ML”. 
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~ puts: PxV > P, put,,(p,v) = plv/a], where plv/a](x") = (x = 2! 2 v s p(2’)) 
(environment updating); 

— app: V? > V, app(v1, v2) = (tTvo+v(v1))(v2) (function application); 

— nat? :V —V, nat?(v) = (v € N° ? valo s valı ) (natural predicate); 

— proc? : V — V, proc#(v) = [v € V o— V ? vab s valı ) (function predicate); 

— given ê: P — V for 1 < i < k, then (é,...,é,): P — VF is defined by 
qê1,-.., êk) (p) = (€1(p),---, €x(p)) (target-tupling); and 

— given D, D’ and D”, then abs: ((D x D’) D”) > (D (D' D")) is 
defined by ((abs(f))(x))(y) = f(x,y) (abstraction); and 

— choice : V? — V (conditional function), add: V? — V (addition), and sub: 
V? = V (subtraction) 


T ifvy=T 
3 : al if U1, 02 = ar 
. v ifv =0 l 
choice(v1, v2, v3) = j add(v1, v2) = 4 ui +v2 if v1, v2 E N 
v3 if vy £0 ea 
L otherwise 


| otherwise 


The definition of sub is analogous to the function add, with the only dif- 
ference that, in the second case, sub(v1, v2) = V1 —N V2, where vy —N V2 = 
max { vı — v2,0 } for each v1, v2 € N. 


The semantics of the language is obtained by defining an algebra H over a 
signature §,° then the initiality yields the unique homomorphism from the term 
algebra. A Scheme term denotes a continuous function in the semantic domain 
H, =P o> V. The interpretation functions of the operators are defined by the 
following equations: 


zl = = get» [Ara (ê) = lVooV O absp v,v (ê (a) put,) 
n], (€1, €2) = app o 4é1, é2) [proc?];"(é) = proc? o ê 

nly = val, lifo] °*(é1, ê2,ê3) = choice o qê1, êz, é3) 
+57" (êi, ê2) = = addo {é,, é2) [nat?]57° (ê) = nat? o ê 


-Ja ° (é1, €2) = sub o (é1, ê2) 


For the sake of simplicity, we made a minor change to the language presented 
in [30]. They have an extra operator wrong to print an error message in case of 
an illegal operation, due to the lack of a type system. For instance, the sum of 
two functions produces the error wrong "non-number". To avoid to add cases 
almost everywhere in the definition of the interpretation functions, we let ill- 
typed terms to denote the value L without an explicit encoding of the error 
message. Furthermore, we denote by = the function application. 


8 We do not define § explicitly since it can be inferred by the algebra equations below. 
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The ML-like language defined in [30] is an extended version of the simply-typed 
lambda calculus. As before, we provide its semantics by defining an algebra M 
over an order-sorted signature M = (S2, <2, X2). 

Let I (should read ‘iota’) be a set of base types and K a I-sorted set of 
base values K = {K,| €I}. We inductively define the set of simple types 
T: If ¿ is a base type, then it is a simple type; If 7,7’ are simple types, then 
(T) — (7’) is a simple type (henceforth we omit the parentheses). We abuse 
notation and extend K to the T-sorted set of simple values K = { K, |TET} 
where K,_,,, = K, > Kr. 

The set of all ML environments is defined as the set of all total functions 
A = Y — K, where Y = {yj,yo,.-.} is a set of variables disjoint from X 
(this assumption comes from [30]) and K = Uer K+. We instantiate I = {n} 
and Kn = N. The poset (S2, <2) carries all the simple types (i.e., T C S2) and 
the sort t; <2 is the reflexive relation on S2 plus T <2 t for each 7 € T. An 
ML term of type T denotes a total function in M, = A — K,, and we define 
My = A > K. Due to the Turing-incompleteness of such a language, we do not 
need all the mathematical machinery of [15,46] to formalize its semantics. 


yt = 5 + Sy) A Ia 7 Ò = 5 ke E(k /y)) 
mg <5 n Wa (Gf) = 5% EEO) 
HM (fa, fiz) = 8 A(S) + oS) PIM” a f2) = 6 fra (6) —w ål) 


H- 


foli (Ê, th, f2) =m 
ACS) = 0 2 f1 (8) 8 2(8) ) 


Until now, we have just formalized the single-languages. The multi-language A 
that combines Scheme and ML is obtained by requiring e x 7 and 7 x ein order 
to use ML terms in place of Scheme terms and vice versa. However, in the sim- 
plest version of the natural embedding, “the system has stuck states, since a 
boundary might receive a value of an inappropriate shape” [30]. They restore 
the type-soundness by first employing dynamic checks, and then by decoupling 
error-handling from the value conversion through the use of higher-order con- 
tracts [12]. We limit ourselves here to describe the first version; the subsequent 
refinements can be embodied by further complicating the semantics of the bound- 
ary functions (we do not have forced any constraints on them). 

Since we need a value representing the notion of stuck state in ML, we have 
to extend the algebra M. This is particularly easy by exploiting the underlying 
framework: We make M+ into an order-sorted M-algebra by defining M+ = 
At — K}, where A+ = Y > K+, K+ = Uer K+, and K} = K,U {L}, and 
the T-sorted injection ¢ from M, to M+ such that y(t) = t. Now, M+ becomes 
an algebra by letting y to be an order-sorted St-homomorphism (this in turn 
forces [—]. = [-] Wi) and letting the interpretation functions to denote the 
value L in the remaining non-yet defined cases (namely, they compute the value 
L whenever one of their arguments is L). 


316 S. Buro and I. Mastroeni 


The boundary function ae, (ê) moves the Scheme value ê: P o> V in M;: 


"e aN? (ê) if ê = val, for some n € N° 
e, T T F . 
aye" (ê) otherwise 


where aù (van) =(T=NAnNEN? nsl) and 


Soko Dy yl TT (Ger (E 0 put, (L, ar elkr)))) 
if r =T > 7” and ê= ivyosy o absp y,y (ê o put,) 
agr (é)= for some z € X and & € V o> V 


otherwise 


Vice versa, &+ (f) moves values from ML to Scheme. Its definition is analo- 
gous to the previous case: an el) = val, where i = d+ n, and 


Qr>r' e = Prout ESA (ar elt (L [ae, r(v )/yl)) 


These definitions adhere the conversion approach of the natural embedding 

n [30]: If ê is the value denoted by a natural number in Scheme, then it is 
ee ery from cases deriving from ill-typed terms—by aN to the corre- 
sponding constant function denoting the same natural value in ML. Otherwise, 
if ê is the value denoted by a Scheme function, then it is mapped by a to 
the ML function with variable x at type 7 — 7’ such that converts its argument 
of type T to the Scheme equivalent by its conversion through &+ e to x. Then it 
runs the original procedure ê on it and convert back the result by ae,7. 

Since the given boundary functions are subsort polymorphic, we can improve 
the construction and handle all the value conversions with a single polymorphic 
operator as explained in Sect. 4.1. 


7 Concluding Remarks 


In this paper, we have addressed the problem of providing a formal semantics to 
the combination of programming languages, the so-called multi-languages. We 
have introduced a new algebraic framework for modeling this new paradigm, and 
we have constructively shown how to attain a multi-language specification by 
only stipulate (1) how the syntactic categories of the single-languages have to 
be combined and (2) how the values may flow from one language to the other. 
We have proved the suitability of the framework to unambiguously yield the 
algebraic semantics of each multi-language term, while simultaneously preserving 
the single-languages semantics. We have also proved that combining languages 
is a close operation, i.e., that every multi-language admits an equivalent order- 
sorted representation. In particular, we have focused our study on the semantic 
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properties of boundary functions in order to provide three different notions of 
multi-language designed to suit both general and specific cases. 

To the best of our knowledge, this is the first attempt to provide a formal 
semantics of a multi-language independently from the combined languages. 


Related Works. Cross-language interoperability is a well-researched area both 
from theoretical and practical points of view. The most related work to our app- 
roach is undoubtedly [30], which provides operational semantics to a combined 
language obtained by embedding a Scheme-like language into an ML-like lan- 
guage. Such an outcome is achieved by introducing boundaries, syntactic con- 
structs that model the flow of values from one language to the other. Ours 
boundary functions draw heavily from their work. Nonetheless, we shift them to 
a semantic level, in order to several variants of multi-language constructions. 

(7,21, 36,40, 53] take a similar line and combine typed and untyped languages 
(Lua and ML [40], Java and PLT Scheme [21], or Assembly and a typed func- 
tional language [36]), focusing on typing issues and values exchanging techniques. 
Instead of focusing on a particular problem, we adopt a rather general framework 
to model languages. This choice abstracts away many low-level details, allowing 
us to reason on semantic concerns in more general terms, without having to fix 
any particular pair of languages. 

A lot of work has been done on multi-language runtime mechanisms: [20] pro- 
vides a type system for a fragment of Microsoft Intermediate Language (IL) used 
by the .NET framework, that allows programmers to write components in sev- 
eral languages (C#, Visual Basic, VBScript, ...) which are then translated to IL. 
[22] proposes a virtual machine that can execute the composition of dynamically 
typed programming languages (Ruby and JavaScript) and statically typed one 
(C). [4,5] describes a multi-language runtime mechanism achieved by combining 
single-language interpreters of (different versions of) Python and Prolog. 


Future Works. From our perspective, the research presented in this paper opens 
up on three directions. Firstly, future works should aim to provide an operational 
semantics to the formalization of multi-languages. Rewriting logic seems the 
most reasonable approach to unifying the denotational world, presented in this 
paper, to the operational one [31]. This line of research is particularly useful in 
order to move towards an implementation of an automatic tool able to combine 
languages such that the resulting multi-language guarantees the results proved 
in the paper. 

Secondly, future research applies to use the multi-language model in order to 
study the problem of analyzing multi-language programs. In particular, we aim 
at investigating how it is possible to obtain analyses of multi-language programs 
by merging already existing analyses of the single combined languages. 

Finally, further studies should investigate the problem of compiling multi- 
languages. Current compilers are closed tools, non-parametric on language con- 
structs (for instance, we cannot compile a single if-then-else term of a stan- 
dard language like C or Java unless it is plugged into a valid program). Several 
works on typing [1, 20,26], compiling [2,37], and running [23,50] multi-language 
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programs already exist, but without providing a formal notion of multi-language. 
It would be beneficial to study how their approaches can be applied to the formal 
framework developed in this paper. 
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Abstract. We define a new denotational semantics for a first-order 
probabilistic programming language in terms of probabilistic event struc- 
tures. This semantics is intensional, meaning that the interpretation of 
a program contains information about its behaviour throughout execu- 
tion, rather than a simple distribution on return values. In particular, 
occurrences of sampling and conditioning are recorded as explicit events, 
partially ordered according to the data dependencies between the corre- 
sponding statements in the program. 

This interpretation is adequate: we show that the usual measure- 
theoretic semantics of a program can be recovered from its event struc- 
ture representation. Moreover it can be leveraged for MCMC inference: 
we prove correct a version of single-site Metropolis-Hastings with incre- 
mental recomputation, in which the proposal kernel takes into account 
the semantic information in order to avoid performing some of the redun- 
dant sampling. 


Keywords: Probabilistic programming - Denotational semantics - 
Event structures - Bayesian inference 


1 Introduction 


Probabilistic programming languages [8] were put forward as promising tools 
for practitioners of Bayesian statistics. By extending traditional programming 
languages with primitives for sampling and conditioning, they allow the user 
to express a wide class of statistical models, and provide a simple interface for 
encoding inference problems. Although the subject of active research, it is still 
notoriously difficult to design inference methods for probabilistic programs which 
perform well for the full class of expressible models. 

One popular inference technique, proposed by Wingate et al. [21], involves 
adapting well-known Monte-Carlo Markov chain methods from statistics to 
probabilistic programs, by manipulating program traces. One such method is 
the Metropolis-Hastings algorithm, which relies on a key proposal step: given a 
program trace x (a sequence 21,...,2, of random choices with their likelihood), 
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a proposal for the next trace sample is generated by choosing i € {1,...,n} 
uniformly, resampling x;, and then continuing to execute the program, only per- 
forming additional sampling for those random choices not appearing in x. The 
variables already present in x are not resampled: only their likelihood is updated 
according to the new value of xi. Likewise, some conditioning statements must 
be re-evaluated in case the corresponding weight is affected by the change to xi. 

Observe that there is some redundancy in this process, since the updat- 
ing process above will only affect variables and observations when their density 
directly depends on the value of x;. This may significantly affect performance: to 
solve an inference problem one must usually perform a large number of proposal 
steps. To overcome this problem, some recent implementations, notably [12, 25], 
make use of incremental recomputation, whereby some of the redundancy can be 
avoided via a form of static analysis. However, as pointed out by Kiselyov [13], 
establishing the correctness of such implementations is tricky. 

Here we address this by introducing a theoretical framework in which to 
reason about data dependencies in probabilistic programs. Specifically, our first 
contribution is to define a denotational semantics for a first-order probabilis- 
tic language, in terms of graph-like structures called event structures [22]. In 
event structures, computational events are partially ordered according to the 
dependencies between them; additionally they can be equipped with quantita- 
tive information to represent probabilistic processes [16,23]. This semantics is 
intensional, unlike most existing semantics for probabilistic programs, in which 
the interpretation of a program resembles a probability distribution on output 
values. We relate our approach to a measure-theoretic semantics [18] through an 
adequacy result. 

Our second contribution is the design of a Metropolis-Hastings algorithm 
which exploits the event structure representation of the program at hand. Some 
of the redundancy in the proposal step of the algorithm is avoided by taking into 
account the extra dependency information given by the semantics. We provide a 
proof of correctness for this algorithm, and argue that an implementation is real- 
istically achievable: we show in particular that all graph structures involved and 
the associated quantitative information admit a finite, concrete representation. 


Outline of the Paper. In Sect.2 we give a short introduction to probabilistic 
programming. We define our main language of study and its measure-theoretic 
semantics. In Sect.3.1, we introduce MCMC methods and the Metropolis- 
Hastings algorithm in the context of probabilistic programming. We then moti- 
vate the need for intensional semantics in order to capture data dependency. In 
Sect. 4 we define our interpretation of programs and prove adequacy. In Sect. 5 
we define an updated version of the algorithm, and prove its correctness. We 
conclude in Sect. 6. 
The proofs of the statements are detailed in the technical report [4]. 
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2 Probabilistic Programming 


In this section we motivate the need for capturing data dependency in probabilis- 
tic programs. Let us start with a brief introduction to probabilistic programming 
—a more comprehensive account can be found in [8]. 


2.1 Conditioning and Posterior Distribution 


Let us introduce the problem of inference in probabilistic programming from the 
point of view of programming language theory. 

We consider a first-order programming language enriched with a real num- 
ber type R and a primitive sample for drawing random values from a given 
family of standard probability distributions. The language is idealised—but it 
is assumed that an implementation of the language comprises built-in sampling 
procedures for those standard distributions. Thus, repeatedly running the pro- 
gram sample Uniform (0,1) returns a sequence of values approaching the true 
uniform distribution on [0, 1]. 

Via other constructs in the language, standard distributions can be combined, 
as shown in the following example program of type R: 


let x = sample Uniform(0O, 1) in 
let y = sample Gaussian(x, 2) in 
x+y 


Here the output will follow a probability distribution built out of the usual 
uniform and Gaussian distributions. Many probabilistic programming languages 
will offer more general programming constructs: conditionals, recursion, higher- 
order functions, data types, etc., enabling a wide range of distributions to be 
expressed in this way. Such a program is sometimes called a generative model. 


Conditioning. The process of conditioning involves rescaling the distribution 
associated with a generative model, so as to reflect some bias. Going back to the 
example above, say we have made some external measurement indicating that 
y = 0, but we would like to account for possible noise in the measurement using 
another Gaussian. To express this we modify the program as follows: 


let x = sample Uniform (0, 1) in 
let y = sample Gaussian (x, 2) in 
observe y (Gaussian (0, 0.01)); 
x+y; 


The purpose of the observe statement is to increase the occurrence of executions 
in which y is close to 0; the original distribution, known as the prior, must be 
updated accordingly. The probabilistic weight of each execution is multiplied 
by an appropriate score, namely the likelihood of the current value of y in 
the Gaussian distribution with parameters (0,0.01). (This is known as a soft 
constraint. Conditioning via hard constraints, i.e. only giving a nonzero score to 
executions where y is exactly 0, is not practically feasible.) 
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The language studied here does not have an observe construct, but instead 
an explicit score primitive; this appears already in [18,19]. So the third line 
in the program above would instead be score(pdf-Gaussian (0, 0.01) (y)) 
where pdf-Gaussian (0, 0.01) is the density function of the Gaussian distri- 
bution. The resulting distribution is not necessarily normalised. We obtain the 
posterior distribution by computing the normalising constant, following Bayes’ 
rule: 

posterior x likelihood x prior. 


This process is known as Bayesian inference and has ubiquitous applications. The 
difficulty lies in computing the normalising constant, which is usually obtained 
as an integral. Below we discuss approximate methods for sampling from the 
posterior distribution; they do not rely on this normalising step. 


Measure Theory. Because this work makes heavy use of probability theory, we 
start with a brief account of measure theory. A standard textbook for this is [1]. 
Recall that a measurable space is a set X equipped with a o-algebra Xx: a 
set of subsets of X containing Ø and closed under complements and countable 
unions. Elements of Xx are called measurable sets. A measure on X is a 
function u : Xx — [0,co], such that u(Ø) = 0 and, for any countable family 
{Ui}icr of measurable sets, u(U;¢7 Vi) = ie (Ui). 

An important example is that of the set R of real numbers, whose o-algebra 
Xr is generated by the intervals [a, b), for a,b € R (in other words, it is the small- 
est o-algebra containing those intervals). The Lebesgue measure on (R, YR) 
is the (unique) measure A assigning b — a to every interval |a, b) (with a < b). 

Given measurable spaces (X, Xx) and (Y, Xy), a function f : X — Y is 
measurable if for every U € Sy, f~1U € Xy. A measurable function f : X > 
[0,00] can be integrated: given U € Xx the integral fy f dA is a well-defined 
element of [0,00]; indeed the map p : Ut Jg fdà is a measure on X, and f is 
said to be a density for u. The precise definition of the integral is standard but 
slightly more involved; we omit it. 

We identify the following important classes of measures: a measure u on 
(X, Xx) is a probability measure if u(X) = 1. It is finite if u(X) < œ, and 
it is s-finite if u = Ì ;cņ Hi, a pointwise, countable sum of finite measures. 

We recall the usual product and coproduct constructions for measurable 
spaces and measures. If {X;};ez is a countable family of measurable spaces, 
their product [],-; X; and coproduct [],-; Xi = U;er{i} x Xi as sets can be 
turned into measurable spaces, where: 


~ Xle, X: İS generated by {] [;ez Ui | Ui € Xx, for all i}, and 
-= Xll,e, X; İS generated by {{i} x U; | i € I and U; € Xx, }. 


The measurable spaces in this paper all belong to a well-behaved subclass: 
call (X, Xx) a standard Borel space if it either countable and discrete (i.e. 
all U C X are in Xx), or measurably isomorphic to (R, Xr). Note that standard 
Borel spaces are closed under countable products and coproducts, and that in a 
standard Borel space all singletons are measurable. 
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2.2 A First-Order Probabilistic Programming Language 
We consider a first-order, call-by-value language £ with types 


A,B: =1|R|[[4 | []4i 
ie] ie] 

where J ranges over nonempty countable sets. The types denote measurable 
spaces in a natural way: [1] is the singleton space, and [R] = (R, Xp). Products 
and coproducts are interpreted via the corresponding measure-theoretic con- 
structions: [I J;e; Ai] = [jeri] and [Hier Ai] = Wier IAi] = Uier tt} x M. 
Moreover, each measurable space [A] has a canonical measure pay : taj > R, 
induced from the Lebesgue measure on R and the Dirac measure on [1] via 
standard product and coproduct measure constructions. 

The terms of £ are given by the following grammar: 


M,N::=()|M;N|fllet a= M in N|a 
| (Mi)ier | case M of {(i, x) > Nihier 
| sample d (M) | score M 
and we use standard syntactic sugar to manipulate integers and booleans: 
B = 1+1, N = Je„l, and constants are given by the appropriate 
injections. Conditionals and sequencing can be expressed in the usual way: 


if M then N, else No = case M of {(i,-) > Nihieti2}, and M; N = 
let a = M in N, where a does not occur in N. In the grammar above: 


— f ranges over measurable functions [A] — |B], where A and B are types; 

— d ranges over a family of parametric distributions over the reals, i.e. measur- 
able functions R” x R — R, for some n € N, such that for every r € R”, 
J d(x,—) = 1. For the purposes of this paper we ignore all issues related to 
invalid parameters, arising from e.g. a call to gaussian with standard devia- 
tion o = 0. (An implementation could, say, choose to behave according to an 
alternative distribution in this case.) 


The typing rules are as follows: 


FEM:A I,a:AFN:B TEM :R” d:R” xR-—R 

Frieta=M in N:B I F sample d (M):R 
IrEFM:R 

I} score M:1 Ta: AFa:A Frosi 

DEM: 327A; T,x: A4 H N;:C Tr} Mi: A; 

IH case M of {(i,x) > Nijjier: C TF (Miser? [hier 4i 


f : [A] — [B] measurable r-M:A 
TETM:B 
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Among the measurable functions f, we point out the following of interest: 


— The usual product projections 7; : [[],-,; Ai] — [Ai] and coproduct injections 
u: [Ad] > [Lier Ail; 

— The operators +, x : R? — R, 

— The tests, eg. > 0: [R] —> [B], 

— The constant functions 1 — A of the form () » a for some a € [A]. 


Examples for d include uniform : R? x R — R, gaussian: R? x R —> R, ... 


2.3 Measure-Theoretic Semantics of Programs 


We now define a semantics of probabilistic programs using the measure-theoretic 
concept of kernel, which we define shortly. The content of this section is not new: 
using kernels as semantics for probabilistic was originally proposed in [14], while 
the (more recent) treatment of conditioning (score) via s-finite kernels is due 
to Staton [18]. Intuitively, kernels provide a semantics of open terms [+ M: A 
as measures on |A] varying according to the values of variables in I’. 

Formally, a kernel from (X, Xx) to (Y, Xy ) is a function k : X x Ly — [0, co] 
such that for each x € X, k(x,—) is a measure, and for each U € Xy, k(—,U) is 
measurable. (Here the o-algebra 21,9) is the restriction of that of R+{oo}.) We 
say k is finite (resp. probabilistic) if each k(x, —) is a finite (resp. probability) 
measure, and it s-finite if it is a countable pointwise sum -ez ki of finite 
kernels. We write k : X ~ Y when k is an s-finite kernel from X to Y. 

A term I F M : A will denote an s-finite kernel [M] : [I] ~~ [A], where 
the context [ = zı : Aj,...,2%p : An denotes the product of its components: 
W] = [Ai] x- x [An]. 

Notice that any measurable function f : X — Y can be seen as a determinis- 
tic kernel ft : X ~~ Y. Given two s-finite kernels k : A ~ B andl: Ax B ~C, 
we define their composition lok: A ~ C: 


(Lo k)(a, X) = f I((a, 6),C) x k(a, db). 
beB 
Staton [18] proved that lo k is a s-finite kernel. 
The interpretation of terms is defined by induction: 


— [QO] is the lifting of [F] > 1 : x (). 

— [let a = M in N] is [N] ° [M] 

- [fM] = ft o [M] 

— fa](z,X) = 6,(X), the Dirac distribution 6,(X) = 1 if x € X and zero 

otherwise. 

— [sample d (M)] = samo[M] where sam, : R” ~~ R is given by samg(r, X) = 

Jex d(x, x)dz. 

— [score M] = sco o [M] where sco: [R] — [1] is sco(r, X) = r - ôo (X). 

= [Mice Tier Xi) = Ter Mi] (7, X:): this is well-defined since the [] X; 
generate the measurable sets of the product space. 
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— [case M of {(i,x) => Ni}ier] = coprod o [M] where coprod : I x 
[Lier Ai] ~ [B] maps (7, {i} x X) to [N:](7, X). 
We observe that when M is a program making no use of conditioning (i.e. a 
generative model), the kernel [M] is probabilistic: 


Lemma 1. For 0+ M: A without scores, [M](7, [A]) = 1 for each y € [I]. 


2.4 Exact Inference 


Note that a kernel 1 ~~ [A] is the same as a measure on [A]. Given a closed 
program + M : A, the measure |M] is a combination of the prior (occurrences of 
sample) and the likelihood (score). Because score can be called on arbitrary 
arguments, it may be the case that the measure of the total space (that is, the 
coefficient [M]({A]]}), often called the model evidence) is 0 or oo. 

Whenever this is not the case, |M] can be normalised to a probability mea- 
sure, the posterior distribution. For every U € 24], 

[M]W) 
norm|M](U) IMA): 

However, in many cases, this computation is intractable. Thus the goal of approx- 
imate inference is to approach norm[M], the true posterior, using a well-chosen 
sequence of samples. 


3 Approximate Inference via Intensional Semantics 


3.1 An Introduction to Approximate Inference 


In this section we describe the Metropolis-Hastings (MH) algorithm for approxi- 
mate inference in the context of probabilistic programming. Metropolis-Hastings 
is a generic algorithm to sample from a probability distribution D on a mea- 
surable state space X, of which we know the density d : X — R up to some 
normalising constant. 

MH is part of a family of inference algorithms called Monte-Carlo Markov 
chain, in which the posterior distribution is approximated by a series of samples 
generated using a Markov chain. 

Formally, the MH algorithm defines a Markov chain M on the state space X, 
that is a probabilistic kernel M : X ~ X. The correctness of the MH algorithm 
is expressed in terms of convergence. It says that for almost all x € X, the 
distribution M” (x,-) converges to D as n goes to infinity, where M” is the n- 
iteration of M: Mo...oM. Intuitively, this means that iterated sampling from 
M gets closer to D with the number of iterations. 

The MH algorithm is itself parametrised by a Markov chain, referred to as 
the proposal kernel P : X ~ X: for each sampled value x € X, a proposed 
value for the next sample is drawn according to P(a,-). Note that correctness 
only holds under certain assumptions on P. 

The MH algorithm assumes that we know how to sample from P, and that 
its density is known, ie. there is a function p : X? — R such that p(z,-) is the 
density of the distribution P(z,-), 
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The MH Algorithm. On an input state x, the MH algorithm samples from P(g, -) 
and gets a new sample x’. It then compares the likelihood of x and x’ by com- 
puting an acceptance ratio a(x, 2’) which says whether the return state is 2’ or 
x. In pseudo-code, for an input state x € X: 


1. Sample a new state x’ from the distribution P(z, -) 
2. Compute the acceptance ratio of x’ with respect to z: 


d(x’) x i 
a(x, 2’) = min (1, CS) 
d(x) x p(2’, x) 
3. With probability a(x, x’), return the new sample 2’, otherwise return the 
input state x. 


The formula for a(x, x’) is known as the Hastings acceptance ratio and is key to 
the correctness of the algorithm. 

Very little is assumed of P, which makes the algorithm very flexible; but of 
course the convergence rate may vary depending on the choice of P. We give a 
more formal description of MH in Sect. 5.2. 


Single-Site MH and Incremental Recomputation. To apply this algorithm to 
probabilistic programming, we need a proposal kernel. Given a program M, the 
execution traces of M form a measurable set Xw. In this setting the proposal is 
given by a kernel Xm ~ Xm. 

A widely adopted choice of proposal is the single-site proposal kernel which, 
given a trace x € Xy, generates a new trace wv’ as follows: 


1. Select uniformly one of the random choices s encountered in z. 

2. Sample a new value for this instruction. 

3. Re-execute the program M from that point onwards and with this new value 
for s, only ever resampling a variable when the corresponding instruction did 
not already appear in x. 


Observe that there is some redundancy in this process: in the final step, 
the entire program has to be explored even though only a subset of the random 
choices will be re-evaluated. Some implementations of Trace MH for probabilistic 
programming make use of incremental recomputation. 

We propose in this paper to statically compile a program M to an event struc- 
ture Gm which makes explicit the probabilistic dependences between events, thus 
avoiding unnecessary sampling. 


3.2 Capturing Probabilistic Dependencies Using Event Structures 


Consider the program depicted in Fig. 1 in which we are interested in learning 
the parameters u and o of a Gaussian distribution from which we have observed 
two data points, say vı and v2. For i = 1,2 the function f; : R — R expresses a 
soft constraint; it can be understood as indicating how much the sampled value 
of xi matches the observed value vj. 
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A trace of this program will be of the form 
Sam u - Sama - Sam z1 - Sam z2 - Sco ( fı x1) - Sco (f2 £2) - Rtn (u, 0), 


for some u, o, £1, and £2 € R corresponding to sampled values for variables mu, 
sigma, x1 and x2. 


let mu = sample uniform (150, 200) in 
let sigma = sample uniform (1, 50) in 
let x1 = sample gaussian (mu, sigma) in 
let x2 = sample gaussian (mu, sigma) in 
score (fı x1); score (f2 x2); 


(mu, sigma) 
Fig. 1. A simple probabilistic program 


A proposal step following the single-site kernel may choose to resample p; 
then it must run through the entire trace, checking for potential dependencies 
to u, though in this case none of the other variables need to be resampled. 

So we argue that viewing a program as tree of traces is not most appropriate 
in this context: we propose instead to compile a program into a partially ordered 
structure reflecting the probabilistic dependencies. 

With our approach, the example above would yield the partial order displayed 
below on the right-hand side. The nodes on the first line corresponds to the 
sample for u and g, and those on the second line to xı and z2. This provides an 
accurate account of the probabilistic dependencies: whenever e < e’ (where < is 
the reflexive, transitive closure of —), it is the case that e’ depends on e. 

According to this representation of the program, a trace is no longer a lin- 
ear order, but instead another partial order, similar to the previous one only 
annotated with a specific value for each variable. This is displayed below, on the 
left-hand side; note that the order < is drawn top to bottom. There is an obvious 
erasure map from the trace (left) to the graph (right); this will be important 
later on. 


Samo Sam u Sam Sam 
y a T v P y 
Sam zı Sam zə Sam Sam 
v v v y 
Sco (fi £1) Sco (fo £2) Sco Sco 
Rtn (u, c) Rtn 


Conflict and Control Flow. We have seen that a partial order can be used 
to faithfully represent the data dependency in the program; it is however not 
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sufficient to accurately describe the control flow. In particular, computational 
events may live in different branches of a conditional statement, as in the fol- 
lowing example: 


let x = sample uniform (0, 5) in 
if x> 2 then sample gaussian (3, 1) 


else sample uniform (2, 4) 


The last two samples are independent, but also incom- Sam 
patible: in any given trace only one of them will occur. An K ` 
example of a trace for this program is Sam1-Sam3-Rtn3. Sam ~~~ Sam 

We represent this information by enriching the partial Vv v 


order with a conflict relation, indicating when two actions Rtn Rtn 
are in different branches of a conditional statement. The resulting structure is 
depicted on the right. Combining partial order and conflict in this way can be 
conveniently formalised using event structures [22]: 


Definition 1. An event structure is a tuple (E, <,#) where (E, <) is a par- 
tially ordered set and # is an irreflexive, binary relation on E such that 


- for every e € E, the set [e] = {e' € E | e' < e} is finite, and 
- if e#e and e' < e", then e#e”. 


From the partial order <, we extract immediate causality —: e — e' when 
e < e' with no events in between; and from the conflict relation, we extract 
minimal conflict ~~: e ~~ e’ when e#e’ and there are no other conflicts in 
[e] U [e’]. In pictures we draw — and ~~ rather than < and #. 

A subset x C E is a configuration of E if it is down-closed (if e’ < e € x 
then e’ € x) and conflict-free (if e,e’ € x then —(e#e’)). So in this framework, 
configurations correspond to exactly to partial executions traces of FE. 

The configuration [e] is the causal history of e; we also write |e) for [e] \ {e}. 
We write @(£) for the set of all finite configurations of E, a partial order under 
inclusion. A configuration x is maximal if it is maximal in @(£): for every 


wv € C(E), if x C 2’ then x = 2’. We use the notation x—C a! when a! = xUf{e}, 
and in that case we say x’ covers z. 

An event structure is confusion-free if minimal conflict is transitive, and if 
any two events e,e’ in minimal conflict satisfy [e) = [e’). 


Compositionality. In order to give semantics to the language in a compositional 
manner, we must consider arbitrary open programs, i.e. with free parameters. 
Therefore we also represent each call to a parameter a as a read event, marked 
Rda. For instance the program x + y with two real parameters will become the 
event structure 
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Note that the read actions on x and y are independent in the program (no order 
is specified), and the event structure respects this independence. 

Our dependency graphs are event structures where each event carries infor- 
mation about the syntactic operation it comes from, a label, which depends on 
the typing context of the program: 


Lptatic:: = Rda | Rtn | Sam | Sco, 
where a ranges over variables a: A in T. 


Definition 2. A dependency graph over I} B is an event structure G along 
with a labelling map Ibl : G > Lsi%%"* where any two events s,s’ € G labelled 
Rtn are in conflict, and all maximal configurations of G are of the form |r] for 
r E€ G a return event. 


The condition on return events ensures that in any configuration of G there is 
at most one return event. Events of G are called static events. 

We use dependency graphs as a causal representation of programs, reflect- 
ing the dependency between different parts of the program. In what follows we 
enrich this representation with runtime information in order to keep track of 
the dataflow of the program (in Sect. 3.3), and the associated distributions (in 
Sect. 3.4). 


3.3 Runtime Values and Dataflow Graphs 


We have seen how data dependency can be captured by representing a program 
P as a dependency graph Gp. But observe that this graph does not give any 
runtime information about the data in P; every event s € Gp only carries a 
label Ibl(s) indicating the class of action it belongs to. (For an event labelled 
Rda, G does not specify the value at a; whereas at runtime this will be filled by 
an element of [A] where A is the type of a.) 

To each label, we can associate a measurable space of possible runtime values: 


Q(Rdb) =[L(b)]  2(Rtn) =[A] 2(Sam) = (R, 5r) 2(Sco) = (R, Xr). 


Then, in a particular execution, an event s € Gp has a value in Q(Ibl(s)), 
and can be instead labelled by the following expanded set: 


L Eg: = Rdav | Rtnv | Samr | Scor 
where r ranges over real numbers; in Rdav, a: A € I and v € [A]; and in 


Rtnv, v ranges over elements of |B]. Notice that there is an obvious forgetful 


map a: ZF, > static, discarding the runtime value. This runtime value can 


be extracted from a label in 27", as follows: 
q(Rdbv) =v q(Rtnv) =v q(Samr) =r q(Scor) =r. 


In particular, we have q(£) € Q(a(e)). 
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Such runtime events organise themselves in an event 
structure Ep, labelled over @77",, the runtime graph 
of P. Runtime graphs are in general uncountable, and so 
difficult to represent pictorially. It can be done in some 
simple, finite cases: the graph for if a then 2 else 3 is depicted on the right. 
Recall that in dependency graphs conflict was used to represent conditional 
branches; here instead conflict is used to keep disjoint the possible outcomes of 
the same static event. (Necessarily, this static event must be a sample or a read, 
since other actions (return, score) are deterministic.) 

Intuitively one can project runtime events to static events by erasing the run- 
time information; this suggests the existence of a function mp : Ep — Gp. This 
function will turn out to satisfy the axioms of a rigid map of event structures: 


Rdatt ~ Rdaff 
v v 
Rtn 2 Rtn 3 


Definition 3. Given event structures (E, <p, #g) and (G, <a, #a) a function 
T:E —G is a rigid map if 


— it preserves configurations: for every x E€ @(E), mz € @(G) 
— it is locally injective: for every x € @(E) and e,e' € x, if r(e) = m(e’) then 
f 
e=e'. 
— it preserves dependency: if e <p e then n(e) <a r(e’). 


In general 7 is not injective, since many runtime events may correspond to the 
same static event — in that case however the axioms will require them to be in 
conflict. The last condition in the definition ensures that all causal dependencies 
come from G. 

Given x € @(Gp) we define the possible runtime values for x as the set 
2(x) of functions mapping s € x to a runtime value in 2(lbl(s)); in other words 
A(x) = [ [ea 2(Ibl(s)). A configuration x’ of Ep can be viewed as a trace over 
mp x'; hence tp {a} := {2' € (Ep) | Tpx’ = 2} is the set of traces of P over 
x. We can now define dataflow graphs: 


Definition 4. A dataflow graph on T F B is a triple S = (Es, Gs, Ts : Es > 
Gs) with Gs a dependency graph and Es a runtime graph, such that: 


- 1g is a rigid map and lblo mg = ao Ibl: Es > Z ptt 
- for each x E€ C (Gs), the following function is injective 


da: m5 {x} — Ac) 
2! = (sa Ibi(8))) 


- ife,e’ € Eg withe ~~ e then me = re’, and moreover e and e' are either 
both sample or both read events. 


As mentioned above, maximal configurations of Ep correspond to total traces 
of P, and will be the states of the Markov chain in Sect. 5. By the second axiom, 
they can be seen as pairs (x € @(Gs),q E€ 2(x)). Because of the third axiom, 
Eg is always confusion-free. 
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Measurable Fibres. Rigid maps are convenient in this context because, they 
allow for reasoning about program traces by organising them as fibres. The key 
property we rely on is the following: 


Lemma 2. Ifa: E — G is a rigid map of event structures, then the induced 
map 7: 6(E) > €(G) is a discrete fibration: that is, for every y E€ @(E), if 
x C my for some x € @(G), then there is a unique y' € @(E) such that y' C y 
and ry! = a. 


This enables an essential feature of our approach: given a configuration x of 
the dataflow graph G, the fibre 7~'{x} over it contains all the (possibly partial) 
program traces over x, i.e. those whose path through the program corresponds 
to that of x. Additionally the lemma implies that every pair of configurations 
xx’ € @(G) such that x C 2’ induces a restriction map Tz : 7 '{2'} > 
ma '{a}, whose action on a program trace over 2’ is to return its prefix over z. 

Although there is no measure-theoretic structure in the definition of dataflow 
graphs, we can recover it: for every £ € @(G's), the fibre 5 '{a} can be equipped 
with the o-algebra induced from X 2(x) via qz; it is generated by sets q7 1U for 
Ure Xol) 

It is easy to check that this makes the restriction map Taa : tg H{2'} > 
mz {x} measurable for each pair x, 2’ of configurations with x C 2x’. (Note that 
this makes S a measurable event structure in the sense of [16].) Moreover, the 
map Qz,s : Te {z} > Q(Ibl(s)) for s € x € (Gg), mapping x’ € m5'{x} to 
q(Ibl(s’)) for s’ the unique antecedent by mg of s in x’, is also measurable. 

We will also make use of the following result: 


Lemma 3. Consider a dataflow S and x,y,z € @(Gs) with x Cy, x C z, and 
yUz€ (Gs). IfyN z= x, then the space t;'{yU z} is isomorphic to the set 


{(Uy, Uz) € ng {y} x mg {2} | Tæ y(uy) = Te, (u2)}, 


with o-algebra generated by sets of the form {(uy,uz) E Xy x Xz | Xy € 
XrzHyp ke E Lect yy and Ty y(uy) = Trz z(uz)}- 

(For the reader with knowledge of category theory, this says exactly that the 
diagram 


= Tyyuz 
nz {y Uz} ——> ng {y} 


rons [pss 


-1 -1 
Ts {z} Em Ts {x} 
is a pullback in the category of measurable spaces.) 


3.4 Quantitative Dataflow Graphs 


We can finally introduce the last bit of information we need about programs in 
order to perform inference: the probabilistic information. So far, in a dataflow 


Probabilistic Programming Inference via Intensional Semantics 335 


graph, we know when the program is sampling, but not from which distribution. 
This is resolved by adding for each sample event s in the dependency graph 
a kernel ks : 1~1{[s)} ~ 271{[s]}. Given a trace x over [s), ks specifies a 
probability distribution according to which x will be extended to a trace over 
[s]. This distribution must of course have support contained in the set Ti) ts) (2h 
of traces over [s] of which x is a prefix; this is the meaning of the technical 
condition in the definition below. 


Definition 5. A quantitative dataflow graph is a tuple S = (Es,Gs,7 
Es — Gg,(k*)) where for each sample event s € Gg, k$ is a kernel n—+{[s)} ~ 
nm {[s]} satisfying for all x € 7~*{[s)}, 


kS (a, "{[s}} \ rity fe) = 0. 


This axiom stipulates that any extension 2’ € m5'{[s]} of x € m5'{[s)} 
drawn by ks must contain x; in effect ks only samples the runtime value for s. 


From Graphs to Kernels. We show how to collapse a quantitative dataflow graph 
Son I F B to a kernel [I] ~ [B]. First, we extend the kernel family on sampling 
events (k$ : m7l{[s)} ~~ a7l{[s]}) to a family (Ke : nH [s)} ~ m—1{[s]}) 
defined on all events s € S, parametrised by the alas of the environment 
y € [I]. To define ke Or, -) it is enough to specify its value on the generating 
set for ¥’,-1,[5)}- As we have seen this contains elements of the form Te} (U) with 
U € X'a([s}). We distinguish the following cases corresponding to the nature of s: 


— If s is a sample event, kl! = ke 
- If s is a read on a: A, any x € 7~"(s) has runtime information q,,)(«) in 


2([s)) which can be extended to 2([s]) by mapping s to (a): 
Rod (a, Us | U) = Ôq [s) (#)[s: acetal) 


— If s is a return or a score event: any x € 7 1{[s)} has at most one extension 
to o(x) € m~1{[s]} (because return and score events cannot be involved in a 
minimal conflict): p3h (z, q (U)) = ðq (o(x)) (U). If o(x) does not exist, we 
let KS! (x, X) = 0. 


[7] 


S = — ; . 
We can now define a kernel kz, : nH a} ~ mH gr' } for every atomic extension 


s 
xz—c x’ in Gg, ie. when x’ \ x = {s}, as follows: 


keO1(y,U) = ks(r[s),0(y), {w € ms {[s]} | (v, w) € UJ). 


The second argument to k, above is always measurable, by a standard measure- 
theoretic argument based on Lemma3, as x N [s] = [s). 
From this definition we derive: 


S1 S2 3 7 
Lemma 4. If r—Cc 21 and x—C £2 are concurrent extensions of x (i.e. sı and 


s2 are not in conflict), then Red o Roly) = KŠD o KR, 
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s Se 
Given a configuration x € G (Gs) and a covering chain g—c T1... —C Tn = 
x, we can finally define a measure on m™tH{ g}: 


pst = kh] 0...0k MN (x,-), 
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where x is the only trace over Ø. The particular covering chain used does not mat- 
ter by the previous lemma. Using this, we can define the kernel of a quantitative 
dataflow graph S as follows: 


s £ 
kernel(S)(7, X) = J e as 
rEGs,lbl(r)=Rtn 


where the measurable map qir], : n7 '{r} — [B] looks up the runtime value of 
r in an element of the fibre over [r] (defined in Sect. 3.3). 


Lemma 5. kernel(S) is an s-finite kernel |T] ~ [B]. 


4 Programs as Labelled Event Structures 


We now detail our interpretation of programs as quantitative dataflow graphs. 
Our interpretation is given by induction, similarly to the measure-theoretic inter- 
pretation given in Sect. 2.3, in which composition of kernels plays a central role. 
In Sect. 4.1, we discuss how to compose quantitative dataflow graphs, and in 
Sect. 4.2, we define our interpretation. 


4.1 Composition of Probablistic Event Structures 


Consider two quantitative dataflow graphs, S on I F- A, and T on T,a: AF B 
where a does not occur in T. In what follows we show how they can be composed 
to form a quantitative dataflow graph T ©* S on T F B. 

Unlike in the kernel model of Sect. 2.3, we will need two notions of composi- 
tion. The first one is akin to the usual sequential composition: actions in T must 
wait on S to return before they can proceed. The second is closer to parallel 
composition: actions on T which do not depend on a read of the variable a can 
be executed in parallel with S. The latter composition is used to interpret the let 
construct. In let a = M in N, we want all the probabilistic actions or reads 
on other variables which do not depend on the value of a to be in parallel with 
M. However, in a program such as case M of {(i,x) > Ni}ier we do not want 
any actions of N; to start before the selected branch is known, i.e. before the 
return value of M is known. 

By way of illustration, consider the following simple example, in which we 
only consider runtime graphs, ignoring the rest of the structure for now. Suppose 
S and T are given by 


Rdbtt ~ Rd bff Sam r Rdatt ~~~ Rda ff 
S= y Y T = D v 
Rtnff  Rtntt Rtn((),tt) ~ Rtn ((), ff) 
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The graph S can be seen to correspond to the program if b then ff else tt 
and T to the pairing (sample d (0),a) for any d. Here S is a runtime graph on 
b: Bt Band T ona: B,b: BFB. 
Both notions of compositions are displayed in the diagram below. The sequen- 
tial composition (left) corresponds to 


if b then (sample d (0),ff) else (sample d (0), tt) 


and the parallel composition to (sample d (0),if b then ff else tt): 


Rd btt ~ Rd bff 


F v v 7 Sam r Rd btt ~ Rd b ff 
T Osq I= | Samr  Samr T Opar 5 = `X 4 
v v Rtnff ~ Rtntt 


Rtn ff Rtn tt 


Composition of Runtime and Dependency Graphs. Let us now define both com- 
position operators at the level of the event structures. Through the bijection 
Lptic ~ HFM, where I’(a) = 1 for all a € dom(L), we will see dependency 
graphs and runtime graphs as the same kind of objects, event structures labelled 
over LPE- 

The two compositions S Ofar T and S O$eq T are two instances of the same 
construction, parametrised by a set of labels D C @F™ 4, p. Informally, D spec- 
ifies which events of T are to depend on the return value of S in the resulting 
composition graph. It is natural to assume in particular that D contains all reads 
on a, and all return events. 

Sequential and parallel composition are instances of this construction where 
D is set to one of the following: 


Tya: AFB _ run Ta: AFB _ run 
Dee —~ “T.a:AFB D oàr 4 {Rd av, Rtn v € LT a: ALB i 


We proceed to describe the construction for an abstract D. Let T be an 
event structure labelled by -Fap and S labelled by 27,4. A configuration 
x € G (S) is a justification of y € € (T) when 


1. if Ibl(y) intersects D, then x contains a return event 
2. for all t € y with label Rdav, there exists an event s € x labelled Rtnv. 


In particular if Ibl(y) does not intersect D, then any configuration of S is a 
justification of y. A minimal justification of y is a justification that admits no 
proper subset which is also a justification of y. We now define the event structure 
S -p T as follows: 


— Events: SU { (x,t) |« € @(S),t € T, x minimal justification for [t]}; 
— Causality: <s U {(2,t),(v,U) |a Ca’ Att} U {s,(z,t) | sea}; 
— Conflict: the symmetric closure of 
#5 U {(x,t), (x,t) | £U Z @(T)V t#et’} 
U {s, (x,t) | {s}Ua ¢ @(S)}. 
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Lemma 6. S -p T is an event structure, and the following is an order- 
isomorphism: 


(,-): {(a,y) € @(S) x G(T) | x is a justification of y} = @(S -p T). 


This event structure is not quite what we want, since it still contains return 
events from S and reads on a from T. To remove them, we use the following 
general construction. Given a X-labelled event structure E and V C E a set of 
visible events, its projection E | V has events V and causality, conflict and 
labelling inherited from E. Thus the composition of S and T is: 


SOpT:=S-pT | ({s€S| snot a return} U {(z,t) | t not a read on a}). 


As a result S ©% T is labelled over -FF as needed. 


Dataflow Information. We now explain how this construction lifts to dataflow 
graphs. Consider dataflow graphs S = (Es,Gs,7gs : Es —> Gg) on I F A and 
T = (Er, Gr,rr : Er > Er) on T,a: At B. Given D C Pate B we define 


Es.prT = Es -q-1p Er Gs.p7T =Gs-pGr 
Esoyr = Es O9-1p Er Gsoyr = Gs OD Gr 


Lemma 7. The maps mg and wr extend to rigid maps 


Tees ee | or Geer 


T8037 : Esoe_, T > Gsoxr 


Moreover, if (x,y) E€ @(Es.57), (nsx, nry) is a well-defined configura- 
tion of Gs.pr. As a result, for (x,y) E€ @(Gs.pr), we have a injection pry : 
nH (x, y)} — a l{a} x nly} making the following diagram commute: 


a(x, y)} i aa} x a Hy} 
Ury} Jax X dy 
2(lx,y)) a Q(x) x Ay) 


In particular, Pz y is measurable and induces the o-algebra on 7~'{(x,y)}. We 
write Px for the map pz g, an isomorphism. 


Adding Probability. At this point we have defined all the components of dataflow 
graphs S ©% T and S-p T. We proceed to make them quantitative. 

Observe first that each sampling event of G's.pr (or equivalently of Gsos T 
— sampling events are never hidden) corresponds either to a sampling event of 
Gg, or to an event (x,t) where t is a sampling event of Gr. We consider both 
cases to define a family of kernels (k3'27) between the fibres of S -p T. This will 


in turn induce a family (kg027) on SO% T. 
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— If s is a sample event of G's, we use the isomorphisms y;,) and yj.) of Lemma 7 
to define: a 
ke°P" (u, X) =k o PA): 
— If s corresponds to (x,t) for t a sample event of Gr, then for every X; € 
Yaza} and X; € Xr HID} we define 


SO T m 
ber (Be Gs y Re ® Me) = a EE (GX). 


By Lemma 7, the sets 05 fy (Xe x X;) form a basis for Y,-14(2,14)}, SO that 

this definition determines the entire kernel. 

So we have defined a kernel k3'°7 for each sample event s of G's.pr. We move 
to the composition (S ©% T). Recall that the causal history of a configuration 
z E€ 6©(Gsgox,r) is the set [z], a configuration of Gg.,7. We see that hiding does 
not affect the fibre structure: 


Lemma 8. For any z € @(Gsos,r), there is a measurable isomorphism wz : 
Tsoa,7t2} = Tg r{l]} 


Using this result and the fact that Gsosr C Gs.pr, we may define for each 


OPT (uy, X) = kS PT (sy (v), YX). 


We conclude: 


Lemma 9. S O$ T := (Gsour, Esos r, Tsor, (ks °P) is a quantitative 
dataflow graph on T F B. 


Multicomposition. By chaining this composition, we can compose on several 
variables at once. Given quantitative dataflow graphs S; on I F A; and T on 
T,a,: Ay,...,@n: Ån F A we define 


(Si) oe) T := Sy Oiar g ee asia T) 
(5) ORY T := S1 OL (+. 03%, 7) 


4.2 Interpretation of Programs 


We now describe how to interpret programs of our language using quantita- 
tive dataflow graphs. To do so we follow the same pattern as for the measure- 
theoretical interpretation given in Sect. 2.3. 


Interpretation of Functions. Given a measurable function f : [A] > [B], we 
define the quantitative dataflow graph 


Rdav Rda 
=| dD, v > y 
vefA] Rtn(fv) Rtn 


We then define [f M]g as [M]g Opar S¢ where a is chosen so as not to occur 
free in M. 
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Probablistic Actions. In order to interpret scoring and sampling primitives, we 
need the following two quantitative dataflow graphs: 


Rdar Rda Rdar Rda 
Vv v v v 
score’ = 5 Scor > Sco sample, = 5 Sams — Sam > ‘sam 
reR 4 y reR” Y Y 
Rtn () Rtn Rtn () Rtn 


and we define ksam by integrating the density function d; here we identify 
2({Rda,Sam}) and 7~!{{Rda, Sam} }: 


ksam({Rdar}, U) = f d(r,q(Sam))dà. 


qEU,q(Rd a)=r 
We can now interpret scoring and sampling constructs: 


[score M]g = [M]g Opa, score [sample d (M)]¢g = [M]g Ofar sample. 


Interpretation of Tuples and Variables. Given a family (a;)ier, we define the 
dataflow graph tuple(,,.4,) ON A1 : Aj,...,@4n : An F Ay X... X An as follows. 
Its set of events is the disjoint union 


U Rd a; v + U Rtn v 


i€I,veE [Ai] ve[Aix...x An] 


where the conflict is induced by Rda;v ~~ Rda;v’ for v 4 v’; and causality 

contains all the pairs Rda;v — Rtn (v1,..., Un) where v; = v. Then we form a 

quantitative dataflow graph Tuple(,,:4,)> whose dependency graph is tuple (a:1) 

(up to the bijection PF, ~ -25ta where I’(a) = 1 for a € dom(T)); and the 

runtime graph is tuple,,,.4,), along with the obvious rigid map between them. 
We then define the semantics of (M1, ..., Mn): 


[(Ma,...,Mn)Io = ([Mi]o): O$? Tuples,.a,, 


where the a; are chosen free in all of the M}. This construction is also useful to 
interpret variables: 


[a]¢g = Tuple,., where lF a: A. 


Interpretation of Pattern Matching. Consider now a term of the form case M of 
{(i,a) = Ni}ie:. By induction, we have that [Ni]g is a quantitative dataflow 
graph on Ia: A; + B. Let us write [Ni] for the quantitative dataflow graph 
on T,a : (icr Ai) F B obtained by relabelling events of the form Rdav to 
Rda (i, v), and sequentially precomposing with Tupleg.ss o; Ar This ensures that 
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minimal events in [.N;]¢ are reads on a. We then build the quantitative dataflow 
graph X ier[N:]g on T,a : Jier Ai + B. This can be composed with [M]g: 


[case M of {(i,a) > Nihier|o = [M]g ‘oes (Sa) ; 


ie] 


It is crucial here that one uses sequential composition: none of the branches must 
be evaluated until the outcome of M is known. 


Adequacy of Composition. We now prove that our interpretation is adequate 
with respect to the measure-theoretic semantics described in Sect. 2.3. Given 
any subset D C static p containing returns and reads on a, we show that the 


composition S ©% T does implement the composition of kernels: 


Theorem 1. For S a quantitative dataflow graph on I’ A and T onI,a: At 
B, we have 


kernel(S ©f T) = kernel(T) o kernel(S) : [T] —> [BP]. 


From this result, we can deduce that the semantics in terms of quantitative 
dataflow graphs is adequate with respect to the measure-theoretic semantics: 


Theorem 2. For every term [+ M : A, kernel([M]g) = [M]. 


5 An Inference Algorithm 


In this section, we exploit the intensional semantics defined above and define 
a Metropolis-Hastings inference algorithm. We start, in Sect. 5.1, by giving a 
concrete presentation of those quantitative dataflow graphs arising as the inter- 
pretation of probabilistic programs; we argue this makes them well-suited for 
manipulation by an algorithm. Then, in Sect. 5.2, we give a more formal intro- 
duction to the Metropolis-Hastings sampling methods than that given in Sect. 3. 
Finally, in Sect. 5.3, we build the proposal kernel on which our implementation 
relies, and conclude. 


5.1 A Concrete Presentation of Probabilistic Dataflow Graphs 


Quantitative dataflow graphs as presented in the previous sections are not easy 
to handle inside of an algorithm: among other things, the runtime graph has an 
uncountable set of events. In this section we show that some dataflow graphs, in 
particular those needed for modelling programs, admit a finite representation. 


Recovering Fibres. Consider a dataflow graph S = (Es,Gs,7s) on I F B. It 
follows from Lemma 3 that the fibre structure of S is completely determined 
by the spaces mg'{[s]}, for s € Gs, so we focus on trying to give a simplified 
representation for those spaces. 
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First, let us notice that if s is a return or score event, given x € 7 !{x}, the 
value qz(s) is determined by qlļjs). In other words the map 7~'{[s]} + 2([s)) is 
an injection. This is due to the fact that minimal conflict in Eg cannot involve 
return or score events. As a result, Eg induces a partial function of : 2({s)) + 
A(IbI(s)), called the outcome function. It is defined as follows: 


S 


E qs(a’)(s) if there exists x’ € m7 '{2'}, qisj(2")|[s) = 4, 
undefined otherwise. 


Note that x’ must be unique by the remark above since its projection to 
Q({s)) is determined by q. The function of is partial, because it might be the 
case that the event s occurs conditionally on the runtime value on |[s). 

In fact this structure is all we need in order to describe a dataflow graph: 


Lemma 10. Given Gs a dependency graph on I F B, and partial functions 
(os) : 2([s)) — QD(lbKs)) for score and return events of S. There exists a 
dataflow graph (Es,Gg,m7s : Es > Gs) whose outcome functions coincide with 
the os. Moreover, there is an order-isomorphism 


@(Es) = {(2,q) | £ € C(Gs),q € A(z), Vs € x, 05(q|[s)) = o(s)}- 


Adding Probabilities. To add probabilities, we simply equip each sample event s 
of Gs with a density function ds : 2([s)) x R > R. 


Definition 6. A concrete quantitative dataflow graph is a tuple (Gg, (0s : 
2([s)) + 2(Ib(s))), (ds : 2([s)) xR — R)scsampie(as)) where ds(zx,-) is 


normalised. 


Lemma 11. Any concrete quantitative dataflow graph S unfolds to a quantita- 
tive dataflow graph unfold S. 


We see now that the quantitative dataflow graphs arising as the interpretation 
of a program must be the unfolding of a concrete quantitative dataflow graph: 


Lemma 12. For any concrete quantitative dataflow graphs S on T + A and 
T onT,a: At B, unfold S ©% Tunfold T is the unfolding of a concrete 
quantitative dataflow graph. It follows that for any program [+ M : B, [M]g 
is the unfolding of a concrete quantitative dataflow graph. 


5.2 Metropolis-Hastings 


Recall that the Metropolis-Hastings algorithm is used to sample from a density 
function d : A — R which may not be normalised. Here A is a measurable state 
space, equipped with a measure à. The algorithm works by building a Markov 
chain whose stationary distribution is D, the probability distribution obtained 
from d after normalisation: 


Soex U2) 
Soca Ua) 


Our presentation and reasoning in the rest of this section are inspired by the 
work of Borgström et al. [2]. 


VX € Xa, D(X) = 
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Preliminaries on Markov Chains. A Markov chain on a measurable state space 
A is a probability kernel k : A ~ A, viewed as a transition function: given a state 
x € A, the distribution k(z,-) is the distribution from which a next sample state 
will be drawn. Usually, each k(a,-) comes with a procedure for sampling: we 
will treat this as a probabilistic program M(x) whose output is the next state. 
Given an initial state x € A and a natural number n € N, we have a distribution 
k”(x,-) on A obtained by iterating k n times. We say that the Markov chain k 
has limit the distribution u on A when 


lim |[k"(z,-)—p||=0 where ||u1 — u2|| = sup pi(A) — p2(A). 
Tee EXA 

For the purposes of this paper, we call a Markov chain k : A — A com- 
putable when there exists a type A such that [A] = A (up to iso) and an 
expression without scores x: At K : A such that [K] = k. (Recall that pro- 
grams without conditioning denote probabilistic kernels, and are easily sampled 
from, since all standard distributions in the language are assumed to come with 
a built-in sampler.) 

We will use terms of our language to describe computable Markov chains 
language, taking mild liberties with syntax. We assume in particular that pro- 
grams may call each other as subroutines (this can be done via substitutions), 
and that manipulating finite structures is computable and thus representable in 
the language. 


The Metropolis-Hastings Algorithm. Recall that we wish to sample from a dis- 
tribution with un-normalised density d : A — R; d is assumed to be computable. 
The Markov chain defined by the Metropolis-Hastings algorithm has two param- 
eters: a computable Markov chain x: A F P : A, the proposal kernel, and a 
measurable, computable function p : A? — R representing the kernel [P], i.e. 


[P] (2, X’) =} plz, x’) dX(2"’). 
al EX! 
The Markov-chain MH(P, p, d) is defined as 


MH(P, p,d)(x) := let x’ = P(x) in 
AET Cone 
nin (1 expe a) 


let u = sample uniform (0,1) in 


let a 


if u<a then 2’ else x 


In words, the Markov chain works as follows: given a start state x, it generates 
a proposal for the next state x’ using P. It then computes an acceptance ratio a, 
which is the probability with which the new sample will be accepted: the return 
state will then either be the original x or x’, accordingly. 
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Assuming P and p satisfy a number of conditions, the algorithm is correct: 
Theorem 3. Assume that P and p satisfies the following properties: 


1. Strong irreducibility: There exists n € N such that for all £x € A and 
X € Xa such that D(X) 4 0 and d(x) > 0, there exists n € N such that 
[PI"(x, X) > 0. 

2. [PIE X) = fire xe P(a,2). 

3. If d(x) >0 and p(z,y) > 0 then d(y) > 

4. If d(x) > 0 and d(y) > 0, then p(x, y) > i iff p(y, xz) > 0. 


Then, the limit of MH(P,p,d) for any initial state x E€ A with d(x) > 0 is equal 
to D, the distribution obtained after normalising d. 


5.3 Our Proposal Kernel 


Consider a closed program + M : A in which every measurable function is a com- 
putable one. Then, its interpretation as a concrete quantitative dataflow graph is 
computable, and we write S for the quantitative dataflow graph whose unfolding 
is [M]g. Moreover, because M is closed, its measure-theoretic semantics gives a 
measure [M] on [A]. Assume that norm([M]) is well-defined: it is a probabil- 
ity distribution on [A]. We describe how a Metropolis-Hastings algorithm may 
be used to sample from it, by reducing this problem to that of sampling from 
configurations of Es according to the following density: 


ds(x,q) := J[ dlas) I] «5 


sEsample(x) s€score(x) 


Lemma 10 induces a natural measure on @(Es). We have: 
Lemma 13. For all X € Xg(Bs); p> (X) = ds(y)dy. 
yEx 


Note that dg(z,q) is easy to compute, but it is not normalised. Computing 
the normalising factor is in general intractable, but the Metropolis-Hastings 
algorithm does not require the density to be normalised. 


Let us write US m(X) = ints for the normalised distribution. By ade- 
quacy, we have for all X € X},): 


norm[M](X) = u$ „m (result ™!(X)). 


where result : max (Es) — [A] maps a maximal configuration of Es to its 
return value, if any. This says that sampling from norm| M] amounts to sampling 
from pS... and only keeping the return value. 

Accordingly, we focus on designing a Metropolis-Hastings algorithm for sam- 
pling values in (Es) following the (unnormalised) density ds. We start by 
defining a proposal kernel for this algorithm. 
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To avoid overburdening the notation, we will no longer distinguish between 
a type and its denotation. Since Gg is finite, it can be represented by a type, 
and so can @ (Gs). Moreover, € (Es) is a subset of > 7 .<¢ (Gg) 2(x) which is also 
representable as the type of pairs (x € @(Gs),q € 2(x)). Operations on Gg and 
related objects are all computable and measurable so we can directly use them in 
the syntax. In particular, we will make use of the function ext : @(Es) —> Gg+1 
which for each configuration (7, q) € @(£g) returns (1,5) if there exists e—C 
with 0.(q|js)) defined, and (2, *) if (x, q) is maximal. 

Informally, for (x, q) € @(Es), the algorithm is: 


— Pick a sample event s € x, randomly over the set of sample events of x. 

— Construct xo := x \ {5 € x | 9 > s}U{s} € (Gs). 

— Return a maximal extension (x',q') of (xo, q|zo) by only resampling the sam- 
ple events of x’ which are not in zx. 


The last step follows the single-site MH principle: sample events in æ N 2’ have 
already been evaluated in x, and are not updated. However, events which are in 
x’ \ x belong to conditional branches not explored in x; they must be sampled. 

We start by formalising the last step of the algorithm. We give a probabilistic 
program complete which has three parameters: the original configuration (x, q), 
the current modification (£o, qo) and returns a possible maximal extension: 


complete(x,q, zo, qo) = case ext(xo,qo) of 
(2, ()) = (£0, 4) 
(1,8) > 
if s is a return or a score event then 
complete(x,v, £o U {s}, gols = os(qo)]) 
else if sex 
complete(z,q,%o U {s}, gols := 4(s)]) 
else 


complete(x,q, zo U {s}, qos := sample d (qo)]) 


The program starts by trying to extend (xo, qo) by calling ext. If (xo, qo) is 
already maximal, we directly return it. Otherwise, we get an event s. To extend 
the quantitative information, there are three cases: 


— if s is not a sample event, ie. since S is closed it must be a return or a score 
event, we use the function os. 

— if s is a sample event occurring in x, we use the value in q 

— if s is a sample event not occurring in x, we sample a value for it. 
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This program is recursive, but because G's is finite, there is a static bound on 
the number of recursive calls; thus this program can be unfolded to a program 
expressible in our language. We can now define the proposal kernel: 


Ps (x, q) = 
let s = sample uniformly over sample events in x in 


let r 


sample ds (q|s)) in 
let zo = «\{s'>s|s' €x} in 


complete(z, q, zo, qfs := r]) 


We now need to compute the density for Ps to be able to apply Metropolis- 
Hastings. Given (x, q), (x',q') € @(Eg), we define: 


psoas a= [EE TT oo) 


s€sample(x) s’€sample(xz’\x) 


Theorem 4. The Markov chain Ps and density p satisfy the hypothe- 
sis of Theorem3, as a result for any (x,q) E€ C(Es) the distribution 
[MA(ds, Ps, ps)" ]((x,q),-) tends to pÈ m as n goes to infinity. 


One can thus sample from norm([M]) using the algorithm above, keeping 
only the return value of the obtained configuration. 

Let us re-state the key advantage of our approach: having access to the data 
dependency information, complete requires fewer steps in general, because at 
each proposal step only a portion of the graph needs exploring. 


6 Conclusion 


Related Work. There are numerous approaches to the semantics of programs 
with random choice. Among those concerned with statistical applications of 
probabilistic programming are Staton et al. [18,19], Ehrhard et al. [7], and 
Dahlqvist et al. [6]. A game semantics model was announced in [15]. 

The work of Scibior et al. [17] was influential in suggesting a denotational 
approach for proving correctness of inference, in the framework of quasi-Borel 
spaces [9]. It is not clear however how one could reason about data dependencies 
in this framework, because of the absence of explicit causal information. 

Hur et al. [11] gives a proof of correctness for Trace MCMC using new forms 
of operational semantics for probabilistic programs. This method is extended to 
higher-order programs with soft constraints in Borgström et al. [2]. However, 
these approaches do not consider incremental recomputation. 

To the best of our knowledge, this is the first work addressing formal cor- 
rectness of incremental recomputation in MCMC. However, methods exist which 
take advantage of data dependency information to improve the performance of 
each proposal step in “naive” Trace MCMC. We mention in particular the work 


Probabilistic Programming Inference via Intensional Semantics 347 


on slicing by Hur et al. [10]; other approaches include [5,24]. In the present work 
we claim no immediate improvement in performance over these techniques, but 
only a mathematical framework for reasoning about the structures involved. 

It is worth remarking that our event structure representation is reminiscent 
of graphical model representation made explicit in some languages. Indeed, for a 
first-order language such as the one of this paper, Bayesian networks can directly 
be used as a semantics, see [20]. We claim that the alternative view offered by 
event structures will allow for an easier extension to higher-order programs, using 
ideas from game semantics. 


Perspectives. This is the start of an investigation into intensional semantics for 
probabilistic programs. Note that the framework of event structures is very flex- 
ible and the semantics presented here is by no means the only possible one. 
Additionally, though the present work only treats the case of a first-order lan- 
guage, we believe that building on recent advances in probabilistic concurrent 
game semantics [3,16] (from which the present work draws much inspiration), we 
can extend the techniques of this paper to arbitrary higher-order probabilistic 
programs with recursion. 
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Abstract. Algebraic effects and handlers are a powerful abstraction 
mechanism to represent and implement control effects. In this work, 
we study their extension with parametric polymorphism that allows 
abstracting not only expressions but also effects and handlers. Although 
polymorphism makes it possible to reuse and reason about effect imple- 
mentations more effectively, it has long been known that a naive combi- 
nation of polymorphic effects and let-polymorphism breaks type safety. 
Although type safety can often be gained by restricting let-bound 
expressions—e.g., by adopting value restriction or weak polymorphism— 
we propose a complementary approach that restricts handlers instead of 
let-bound expressions. Our key observation is that, informally speaking, a 
handler is safe if resumptions from the handler do not interfere with each 
other. To formalize our idea, we define a call-by-value lambda calculus 
Met that supports let-polymorphism and polymorphic algebraic effects 
and handlers, design a type system that rejects interfering handlers, and 
prove type safety of our calculus. 


1 Introduction 


Algebraic effects [20] and handlers [21] are a powerful abstraction mechanism 
to represent and implement control effects, such as exceptions, interactive I/O, 
mutable states, and nondeterminism. They are growing in popularity, thanks to 
their success in achieving modularity of effects, especially the clear separation 
between their interfaces and their implementations. An interface of effects is 
given as a set of operations—e.g., an interface of mutable states consists of two 
operations, namely, put and get—with their signatures. An implementation is 
given by a handler H, which provides a set of interpretations of the operations 
(called operation clauses), and a handle—-with expression handle M with H asso- 
ciates effects invoked during the computation of M with handler H. Algebraic 
effects and handlers work as resumable exceptions: when an effect operation is 
invoked, the run-time system tries to find the nearest handler that handles the 
invoked operation; if it is found, the corresponding operation clause is evaluated 
by using the argument to the operation invocation and the continuation up to 
the handler. The continuation gives the ability to resume the computation from 
the point where the operation was invoked, using the result from the operation 
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clause. Another modularity that algebraic effects provide is flexible composition: 
multiple algebraic effects can be combined freely [13]. 

In this work, we study an extension of algebraic effects and handlers with 
another type-based abstraction mechanism—parametric polymorphism [22]. In 
general, parametric polymorphism is a basis of generic programming and enhance 
code reusability by abstracting expressions over types. This work allows abstract- 
ing not only expressions but also effect operations and handlers, which makes it 
possible to reuse and reason about effect implementations that are independent 
of concrete type representations. Like in many functional languages, we intro- 
duce polymorphism in the form of let-polymorphism for its practically desirable 
properties such as decidable typechecking and type inference. 

As is well known, however, a naive combination of polymorphic effects and 
let-polymorphism breaks type safety [11,23]. Many researchers have attacked this 
classical problem [1,2,10,12,14,17,23,24], and their common idea is to restrict 
the form of let-bound expressions. For example, value restriction [23,24], which 
is the standard way to make ML-like languages with imperative features and 
let-polymorphism type safe, allows only syntactic values to be polymorphic. 

In this work, we propose a new approach to achieving type safety in a lan- 
guage with let-polymorphic and polymorphic effects and handlers: the idea is 
to restrict handlers instead of let-bound expressions. Since a handler gives an 
implementation of an effect, our work can be viewed as giving a criterion that 
suggests what effects can cooperate safely with (unrestricted) let-polymorphism 
and what effects cannot. Our key observation for type safety is that, informally 
speaking, an invocation of a polymorphic effect in a let-bound expression is safe 
if resumptions in the corresponding operation clause do not interfere with each 
other. We formalize this discipline into a type system and show that typeable 
programs do not get stuck. 

Our contributions are summarized as follows. 


— We introduce a call-by-value, statically typed lambda calculus vee that sup- 


ports let-polymorphism and polymorphic algebraic effects and handlers. The 
type system of A! allows any let-bound expressions involving effects to be 
polymorphic, but, instead, disallows handlers where resumptions interfere 
with each other. 

— To give the semantics of Ag, we formalize an intermediate language pn 
wherein type information is made explicit and define a formal elaboration 
from: AM? to Af. 

— We prove type safety of AR by type preservation of the elaboration and type 
soundness of A‘. 


We believe that our approach is complementary to the usual approach of restrict- 
ing let-bound expressions: for handlers that are considered unsafe by our crite- 
rion, the value restriction can still be used. 

The rest of this paper is organized as follows. Section 2 provides an overview 
of our work, giving motivating examples of polymorphic effects and handlers, 
a problem in naive combination of polymorphic effects and let-polymorphism, 
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and our solution to gain type safety with those features. Section 3 defines the 
surface language A!St, and Sect. 4 defines the intermediate language AJ, and 
the elaboration from Alst to A We also state that the elaboration is type- 
preserving and that A4 is type sound in Sect. 4. Finally, we discuss related work 
in Sect. 5 and conclude in Sect. 6. The proofs of the stated properties and the 
full definition of the elaboration are given in the full version at https://arxiv. 


org/abs/1811.07332. 


2 Overview 


We start with reviewing how monomorphic algebraic effects and handlers work 
through examples and then extend them to a polymorphic version. We also 
explain why polymorphic effects are inconsistent with let-polymorphism, if 
naively combined, and how we resolve it. 


2.1 Monomorphic Algebraic Effects and Handlers 


Exception. Our first example is exception handling, shown in an ML-like lan- 
guage below. 


1 effect fail : unit © unit 
let divi00 (x:int) : int = 


if x = 0 then (#fail(); -1) 
else 100 / x 


aon WD 


7 let f (y:int) : int option = 
8 handle (div_100 y) with 

9 return z — Some z 

10 fail z — None 


Some and None are constructors of datatype a option. Line 1 declares an effect 
operation fail, which signals that an anomaly happens, with its signature 
unit — unit, which means that the operation is invoked with the unit value (), 
causes some effect, and may return the unit value. The function div100, defined 
in Lines 3-5, is an example that uses fail; it returns the number obtained by 
dividing 100 by argument x if x is not zero; otherwise, if x is zero, it raises 
an exception by calling effect operation fail. In general, we write #op(M) 
for invoking effect operation op with argument M. The function f (Lines 7-10) 
calls div_100 inside a handle—with expression, which returns Some n if div_100 
returns integer n normally and returns None if it invokes fail. 

An expression of the form handle M with H handles effect operations 
invoked in M (which we call handled expression) according to the effect inter- 
pretations given by handler H. A handler H consists of two parts: a single return 


1 Here, “; -1” is necessary to make the types of both branches the same; it becomes 
unnecessary when we introduce polymorphic effects. 
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clause and zero or more operation clauses. A return clause return x > M’ will 
be executed if the evaluation of M results in a value v. Then, the value of M’ 
(where x is bound to v) will be the value of the entire handle—-with expression. 
For example, in the program above, if a nonzero number n is passed to f, the 
handle-with expression would return Some (100/n) because div100 n returns 
100/n. An operation clause op x — M’ defines an implementation of effect op: 
if the evaluation of handled expression M invokes effect op with argument v, 
expression M’ will be evaluated after substituting v for x and the value of M’ 
will be the value of the entire handle—with expression. In the program example 
above, if zero is given to f, then None will be returned because div100 0 invokes 
fail. 

As shown above, algebraic effect handling is similar to exception handling. 
However, a distinctive feature of algebraic effect handling is that it allows 
resumption of the computation from the point where an effect operation was 
invoked. The next example demonstrates such an ability of algebraic effect 
handlers. 


Choice. The next example is effect choose, which returns one of the given two 
arguments. 


1 effect choose : int x int ~— int 


2 

3 handle (#choose(1,2) + #choose(10,20)) with 
4 return x —> x 

5 choose x — resume (fst x) 


As usual, A; x Ag is a product type, (Mı, M2) is a pair expression, and fst 
is the first projection function. The first line declares that effect choose is for 
choosing integers. The handled expression #choose(1,2) + #choose(10, 20) 
intuitively suggests that there would be four possible results—11, 21, 12, and 
22—depending on which value each invocation of choose returns. The handler 
in this example always chooses the first element of a given pair? and returns 
it by using a resume expression, and, as a result, the expression in Lines 3-5 
evaluates to 11. 

A resumption expression resume M in an operation clause makes it possible 
to return a value of M to the point where an effect operation was invoked. This 
behavior is realized by constructing a delimited continuation from the point of 
the effect invocation up to the handle—with expression that deals with the effect 
and passing the value of M to the continuation. We illustrate it by using the pro- 
gram above. When the handled expression #choose(1,2) + #choose(10, 20) 
is evaluated, continuation c q [] + #choose(10,20) is constructed. Then, the 
body resume (fst x) ofthe operation clause is evaluated after binding x to the 
invocation argument (1,2). Receiving the value 1 of fst (1,2), the resumption 


2 We can think of more practical implementations, which choose one of the two argu- 
ments by other means, say, random values. 
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expression passes it to the continuation c and c[1] = 1 + #choose(10,20) is eval- 
uated under the same handler. Next, choose is invoked with argument (10,20). 
Similarly, continuation c ae [] is constructed and the operation clause for 
choose is executed again. Since fst (10,20) evaluates to 10, c'[10] = 1 + 10 
is evaluated under the same handler. Since the return clause returns what it 
receives, the entire expression evaluates to 11. 

Finally, we briefly review how an operation clause involving resump- 
tion expressions is typechecked [3,13,16]. Let us consider operation clause 
op(x) — M for op of type signature A — B. The typechecking is performed 
as follows. First, argument x is assigned the domain type A of the signature as 
it will be bound to an argument of an effect invocation. Second, for resumption 
expression resume M’ in M, (1) M’ is required to have the codomain type B of 
the signature because its value will be passed to the continuation as the result 
of the invocation and (2) the resumption expression is assigned the same type as 
the return clause. Third, the type of the body M has to be the same as that of 
the return clause because the value of M is the result of the entire handle—with 
expression. For example, the above operation clause for choose is typechecked 
as follows: first, argument x is assigned type int x int; second, it is checked 
whether the argument fst x of the resumption expression has int, the codomain 
type of choose; third, it is checked whether the body resume (fst x) of the 
clause has the same type as the return clause, i.e., int. If all the requirements 
are satisfied, the clause is well typed. 


2.2 Polymorphic Algebraic Effects and Handlers 


This section discusses motivation for polymorphism in algebraic effects and han- 
dlers. There are two ways to introduce polymorphism: by parameterized effects 
and by polymorphic effects. 

The former is used to parameterize the declaration of an effect by types. For 
example, one might declare: 


effect a choose : axa = aqa 


An invocation #choose involves a parameterized effect of the form A choose 
(where A denotes a type), according to the type of arguments: For example, 
#choose(true,false) has the effect bool choose and #choose(1,-1) has int 
choose. Handlers are required for each effect A choose. 

The latter is used to give a polymorphic type to an effect. For example, one 
may declare 


effect choose : Va.axa oa 


In this case, the effect can be invoked with different types, but all invocations 
have the same effect choose. One can implement a single operation clause that 
can handle all invocations of choose, regardless of argument types. Koka sup- 
ports both styles [16] (with the value restriction); we focus, however, on the 
latter in this paper. A type system for parameterized effects lifting the value 
restriction is studied by Kammar and Pretnar [14] (see Sect. 5 for comparison). 
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In what follows, we show a polymorphic version of the examples we have 
seen, along with brief discussions on how polymorphic effects help with reasoning 
about effect implementations. Other practical examples of polymorphic effects 
can be found in Leijen’s work [16]. 


Polymorphic Exception. First, we extend the exception effect fail with poly- 
morphism. 


effect fail” : Va. unit > a 


if x = 0 then #fail“() 


1 
2 
3 let div100” (x:int) : int = 
4 
5 else 100 / x 


The polymorphic type signature of effect fail’, given in Line 1, means that the 
codomain type a can be any. Thus, we do not need to append the dummy value 
-1 to the invocation of fail” by instantiating the bound type variable a with 
int (the shaded part). 


Choice. Next, let us make choose polymorphic. 


1 effect choose’ : Va. axa => a 

2 

3 let rec random_walk (x:int) : int = 

4 let b = #choose’(true,false) in 

5 if b then random_walk (x + #choose”(1,-1)) 

6 else x 

7 

8 let f (s:int) = 

9 handle random_walk s with 

10 return x — x 

11 choose” y — if rand() < 0.0 then resume (fst y) 
12 else resume (snd y) 


The function random_walk implements random walk; it takes the current coor- 
dinate x, chooses whether it stops, and, if it decides to continue, recursively calls 
itself with a new coordinate. In the definition, choose” is used twice with dif- 
ferent types: bool and int. Lines 11-12 give choose” an interpretation, which 
calls rand to obtain a random float,’ and returns either the first or the second 
element of y. 

Typechecking of operation clauses could be extended in a straightforward 
manner. That is, an operation clause op(x) — M for an effect operation of 
signature Va.A — B would be typechecked as follows: first, œ is locally bound 
in the clause and z is assigned type A; second, an argument of a resumption 


3 One might implement rand as another effect operation. 
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expression must have type B (which may contain type variable a); third, M 
must have the same type as that of the return clause (its type cannot contain a 
as a is local) under the assumption that resumption expressions have the same 
type as the return clause. For example, let us consider typechecking of the above 
operation clause for choose’. First, the typechecking algorithm allocates a local 
type variable a and assigns type a x a to y. The body has two resumption 
expressions, and it is checked whether the arguments fst y and snd y have 
the codomain type a of the signature. Finally, it is checked whether the body 
is typed at int assuming that the resumption expressions have type int. The 
operation clause meets all the requirements, and, therefore, it would be well 
typed. 

An obvious advantage of polymorphic effects is reusability. Without poly- 
morphism, one has to declare many versions of choose for different types. 

Another pleasant effect of polymorphic effects is that, thanks to parametric- 
ity, inappropriate implementations for an effect operation can be excluded. For 
example, it is not possible for an implementation of choose’ to resume with 
values other than the first or second element of y. In the monomorphic ver- 
sion, however, it is possible to resume with any integer, as opposed to what the 
name of the operation suggests. A similar argument applies to fail’; since the 
codomain type is a, which does not appear in the domain type, it is not pos- 
sible to resume! In other words, the signature Va. unit —> a enforces that no 
invocation of fail” will return. 


2.3 Problem in Naive Combination with Let-Polymorphism 


Although polymorphic effects and handlers provide an ability to abstract and 
restrict effect implementations, one may easily expect that their unrestricted 
use with naive let-polymorphism, which allows any let-bound expressions to be 
polymorphic, breaks type safety. Indeed, it does. 

We develop a counterexample, inspired by Harper and Lillibridge [11], below. 


effect get_id : Va. unit > (a > a) 


let f 2 : int = 
let g = #get_idQ in (* g : Va.a—a *) 
if (g true) then ((g 0) + 1) else 2 


The function f first binds g to the invocation result of op. The expression 
#get_id() is given type a — a and the naive let-polymorphism would assign 
type scheme Va.a — a to g, which makes both g true and g 0 (and thus the 
definition of f) well typed. 

An intended use of f is as follows: 


handle f () with 
return x — X 
get_id y — resume (Az. z) 
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The operation clause for get_id resumes with the identity function Az.z. It 
would be well typed under the typechecking procedure described in Sect. 2.2 
and it safely returns 1. 

However, the following strange expression 


handle f () with 
return x — x 
get_id y — resume (Azl. (resume (\z2. z1)); z1) 


will get stuck, although this expression would be well typed: both Az1. +--+ ;z1 
and Az2. z1 could be given type a — a by assigning both z1 and z2 type a, 
which is the type variable local to this clause. Let us see how the evaluation gets 
stuck in detail. When the handled expression f () invokes effect get_id, the 
following continuation will be constructed: 


ee iet g = |] in if (g true) then ((g 0) + 1) else 2. 


Next, the body of the operation clause get_id is evaluated. It immediately 
resumes and reduces to 


é[(Azi. ¢[(Az2.21)]; z1)] 


where 


handle c with 
c ‘st return x — x 
get_id y — resume (\z1. (resume (\z2.z1)); z1), 


which is the continuation c under the same handler. The evaluation proceeds as 
follows (here, k $É \z1. c/[(Az2.z1)]; z1): 


e[Oazt. c[(Az2.z1)]; z1)] 
= handle let g = k in if (g true) then ((g 0) + 1) else 2 with ... 
— handle if (k true) then ((k 0) + 1) else 2 with ... 
— + handle if c'[(Az2.true)]; true then ((k 0) + 1) else 2 with ... 


Here, the hole in c’ is filled by function (Az2.true), which returns a Boolean 
value, though the hole is supposed to be filled by a function of Va.a — a. This 
weird gap triggers a run-time error: 


c[Cz2. true) | 


handle 
= let g = Az2.true in if (g true) then ((g 0) + 1) else 2 
with ... 
— * handle if true then (((\z2.true) 0) + 1) else 2 with ... 
—> handle ((Az2.true) 0) + 1 with ... 
— handle true + 1 with ... 


We stop here because true + 1 cannot reduce. 
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2.4 Our Solution 


A standard approach to this problem is to restrict the form of let-bound expres- 
sions by some means such as the (relaxed) value restriction [10,23,24] or weak 
polymorphism [1,12]. This approach amounts to restricting how effect operations 
can be used. 

In this paper, we seek for a complementary approach, which is to restrict 
how effect operations can be implemented.* More concretely, we develop a type 
system such that let-bound expressions are polymorphic as long as they invoke 
only “safe” polymorphic effects and the notion of safe polymorphic effects is 
formalized in terms of typing rules (for handlers). 

To see what are “safe” effects, let us examine the above counterexample to 
type safety. The crux of the counterexample is that 


1. continuation c uses g polymorphically, namely, as bool — bool in g true 
and as int — int in g 1; 

2. c is invoked twice; and 

3. the use of g as bool — bool in the first invocation of c—where g is bound to 
Azi.--+; z1—“alters” the type of Az2. z1 (passed to resume) from a > a 
to a — bool, contradicting the second use of g as int — int in the second 
invocation of c. 


The last point is crucial—if \z2.z1 were, say, \z2.z2, there would be no influence 
from the first invocation of c and the evaluation would succeed. The problem we 
see here is that the naive type system mistakenly allows interference between 
the arguments to the two resumptions by assuming that z1 and z2 share the 
same type. 

Based on this observation, the typing rule for resumption is revised to disallow 
interference between different resumptions by separating their types: for each 
resume M in the operation clause for op : Vai ::-@n.A — B, M has to have 
type B’ obtained by renaming all type variables a; in B with fresh type variables 
a. In the case of get_id, the two resumptions should be called with 8 — 8 and 
y — y for fresh @ and y; for the first resume to be well typed, z1 has to be of 
type 8, although it means that the return type of Az2.z1 (given to the second 
resumption) is 3, making the entire clause ill typed, as we expect. If a clause 
does not have interfering resumptions like 


get_id y — resume (\z1.z1) 
or 
get_id y — resume (\z1. (resume (\z2.z2)); z1), 


it will be well typed. 


4 We compare our approach with the standard approaches in Sect. 5 in detail. 
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3 Surface Language: A's 


We define a lambda calculus X!S that supports let-polymorphism, polymorphic 
algebraic effects, and handlers without interfering resumptions. This section 
introduces the syntax and the type system of Al. The semantics is given by 
a formal elaboration to intermediate calculus M which will be introduced in 
Sect. 4. 


Effect operations op Type variables qœ@,ĝß,Ņy 

Effects € := sets of effect operations 

Base types L x= bool | int |... 

Types A,B,C,D := alt|AveB 

Type schemes o n= A|Vao 

Constants c n= true | false |0| +|... 

Terms M z= x|c]|Ar.M | M M | letz = Mi in M | 
#op( M) | handle M with H | resume M 

Handlers H x= returnz — M | H;op(z) —> M 

Typing contexts I n= |T, z:o |T; 


Fig. 1. Syntax of A$. 


3.1 Syntax 


The syntax of Als is given in Fig. 1. Effect operations are denoted by op and 
type variables by a, 8, and y. An effect, denoted by e, is a finite set of effect 
operations. We write () for the empty effect set. A type, denoted by A, B, C, and 
D, is a type variable; a base type z, which includes, e.g., bool and int; or a function 
type A —e B, which is given to functions that take an argument of type A and 
compute a value of type B possibly with effect e. A type scheme ø is obtained by 
abstracting type variables. Terms, denoted by M, consist of variables; constants 
(including primitive operations); lambda abstractions \x.M, which bind z in M; 
function applications; let-expressions let x = Mj in M2, which bind z in M2; effect 
invocations #op( M); handle—-with expressions handle M with H; and resumption 
expressions resume M. All type information in Alt is implicit; thus the terms 
have no type annotations. A handler H has a single return clause return z > M, 
where x is bound in M, and zero or more operation clauses of the form op(xz) > 
M, where x is bound in M. A typing context I" binds a sequence of variable 
declarations x :ø and type variable declarations a. 

We introduce the following notations used throughout this paper. We write 
VatSl.A for Var....Van.A where I = {1,...,n}. We often omit indices (i and 
j) and index sets (I and J) if they are not important: e.g., we often abbreviate 
ValS. A to Va!.A or even to Va@.A. Similarly, we use a bold font for other 
sequences (A= for a sequence of types, v‘<! for a sequence of values, etc.). 


Handling Polymorphic Algebraic Effects 363 


We sometimes write {a} to view the sequence a as a set by ignoring the order. 
Free type variables ftv(a) in a type scheme o and type substitution B[ A/a] of 
A for type variables a in B are defined as usual (with the understanding that 
the omitted index sets for A and œ are the same). 

We suppose that each constant c is assigned a first-order closed type ty(c) 
of the form 4; —> () --- + () Ln and that each effect operation op is assigned 
a signature of the form Va.A — B, which means that an invocation of op 
with type instantiation C takes an argument of A[C/a] and returns a value of 
B[C/a]. We also assume that, for ty (op) = Va.A —> B, ftv(A) C {a} and 
ftu(B) C {a}. 


3.2 Type System 


The type system of Alt consists of four judgments: well-formedness of typing 
contexts F I; well formedness of type schemes I’ | o; term typing judgment 
Ir;R + M : Ale, which means that M computes a value of A possibly with 
effect € under typing context I and resumption type R (discussed below); and 
handler typing judgment l’; Rt H : Ale > B |e’, which means that H handles 
a computation that produces a value of A with effect € and that the clauses in 
H compute a value of B possibly with effect «’ under I’ and R. 
A resumption type R contains type information for resumption. 


Definition 1 (Resumption type). Resumption types in NG denoted by R, 
are defined as follows: 


R ::= none | (a, A, B >e C) 
(if ftv(A) U ftv(B) C {a} and ftu(C) N {a} = 0) 


If M is not a subterm of an operation clause, it is typechecked under R = none, 
which means that M cannot contain resumption expressions. Otherwise, suppose 
that M is a subterm of an operation clause op(x) — M’ that handles effect op 
of signature Va. A — B and computes a value of C possibly with effect e. Then, 
M is typechecked under R = (a,x: A,B —e C), which means that argument 
x to the operation clause has type A and that resumptions in M are effectful 
functions from B to C with effect e. Note that type variables œ occur free only 
in A and B but not in C. 

Figure 2 shows the inference rules of the judgments (except for + o, which 
is defined by: I F ø if and only if all free type variables in o are bound by I’). 
For a sequence of type schemes ø, we write I H o if and only if every type 
scheme in ø is well formed under I’. 

Well-formedness rules for typing contexts, shown at the top of Fig. 2, are 
standard. A typing context is well formed if it is empty (WF_EMPTY) or a 
variable in the typing context is associated with a type scheme that is well formed 
in the remaining typing context (WF-VAR) and a type variable in the typing 
context is not declared (WF_TVAaR). For typing context I’, dom(I’) denotes the 
set of type and term variables declared in I’. 
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Well-formed rules for typing contexts 


EI 
eee HP zg dom(T) Pho pey 
Hø E Ergai Dog 
Fr ag dom(T) 
WF_TyVAR 
Frà 
Typing rules 
T;REM:Ale 
Fr a¢:VaAeTl TEB ie apy ae TS._Cons 
= : i ry Ee ee 2 NST 
T; RF z: A[B/al le i T; Rt c: ty(e) le 
T,z: 4R -M:B]|e TSA 
T;RtAv.M:Ae Ble ` a 
T;R-M:A4A>e” Ble TRF Me: Ale & Ce 
TS_APP 
T: RF Mı Mz: Ble 
Tœ; R H M:Aļe I,x:Va.A;RE M2: Ble 
TS_LET 


I; RF letz = Min M2: Ble 


T;Rt+M:Alée eCe 


TS. 
T;RFM:Ale SWEAR 


ty (op) = Va.A@B opee F;REM:A[C/alle FEC 


T; RE #op(M) : B[C/al |e Tape 


TrT;R-M:Aļe F;R+H:Ale>Ble 


TSH 
T: RF handle M with H : Ble P HANDLE 


HIi,gz:D,I> acı eCceé 
I, I, B, <: A[B/a); (a,x: A,B eC) M : B|6/a] |e 
Di, x: D, I2; (œ,x: A,B >e C) F resume M : Cle’ 


TS- -RESUME 


Tr;R-H:A|e>B]|e 


T,xz:A4RFPM:B|ée eCe 


THS-R 
I; RF returnzr > M :Aļe> Ble aes 


r;RHH:Ale>Bleé 
ty (op) = Va.C > D T,a,2:C;(a,z:C,D—>e' B)F M: Bie 
T; Rt} H;op(z) > M: Ale {op} > Ble’ 


THS_Op 


Fig. 2. Typing rules. 


Handling Polymorphic Algebraic Effects 365 


Typing rules for terms are given in the middle of Fig. 2. The first six rules are 
standard for the lambda calculus with let-polymorphism and a type-and-effect 
system. If a variable x is introduced by a let-expression and has type scheme 
Va.A in I, it is given type A[B/a], obtained by instantiating type variables a 
with well-formed types B. If x is bound by other constructors (e.g., a lambda 
abstraction), z is always bound to a monomorphic type and both a and B are the 
empty sequence. Note that (TS_VAR) gives any effect € to the typing judgment 
for z. In general, e in judgment IT; RH M : Ale means that the evaluation of 
M may invoke effect operations in e. Since a reference to a variable involves 
no effect, it is given any effect; for the same reason, value constructors are also 
given any effect. The rule (TS_CONST) means that the type of a constant is 
given by (meta-level) function ty. The typing rules for lambda abstractions and 
function applications are standard in the lambda calculus equipped with a type- 
and-effect system. The rule (TS_ABs) gives lambda abstraction Ax.M function 
type A —>€¢ B if M computes a value of B possibly with effect e’ by using x of 
type A. The rule (TS_APP) requires that (1) the argument type of function part 
M, be equivalent to the type of actual argument Mz and (2) effect ¢’ invoked 
by function Mı be contained in the whole effect e. The rule (TS_WEAK) allows 
weakening of effects. 

The next two rules are mostly standard for algebraic effects and handlers. 
The rule (TS_OP) is applied to effect invocations. Since \!¢t supports implicit 
polymorphism, an invocation #op(M) of polymorphic effect op of signature 
Va.A — B also accompanies implicit type substitution of well-formed types 
C for a. Thus, the type of argument M has to be A[C’/a] and the result of the 
invocation is given type B[C'/a]. In addition, effect € contains op. The typeabil- 
ity of handle-with expressions depends on the typing of handlers (TS_HANDLE), 
which will be explained below shortly. 

The last typing rule (TS_RESUME) is the key to gaining type safety in this 
work. Suppose that we are given resumption type (a, x : A, B — «e C). Intuitively, 
B —e C is the type of the continuation for resumption and, therefore, argument 
M to resume is required to have type B. As we have discussed in Sect. 2, we avoid 
interference between different resumptions by renaming a, the type parameters 
to the effect operation, to fresh type variables 8, in typechecking M. Freshness 
of B will be ensured when well-formedness of typing contexts I, I>, G,... is 
checked at the leaves of the type derivation. The type variables œ in the type 
of x, the parameter to the operation, are also renamed for x to be useful in M. 
To see why this renaming is useful, let us consider an extension of the calculus 
with pairs and typechecking of an operation clause for choose” of signature 
Va.a xa => a: 

choose’ (x) — resume (fst x) 


Variable x is assigned product type a x a for fresh type variable a and the body 
resume (fst x) is typechecked under the resumption type (a, £:aa X a,a >€ A) 
for some e and A (see the typing rules for handlers for details). To typecheck 
resume (fst x), the argument fst is required to have type Ø, freshly generated 
for this resume. Without applying renaming also to z, the clause would not 
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typecheck. Finally, (TS_RESUME) also requires that (1) the typing context con- 
tains a, which should have been declared at an application of the typing rule 
for the operation clause that surrounds this resume and (2) effect e, which may 
be invoked by resumption of a continuation, be contained in the whole effect e’. 
The binding x: D in the conclusion means that parameter x to the operation 
clause is declared outside the resumption expression. 

The typing rules for handlers are standard [3, 13, 16]. The rule (THS_RETURN) 
for a return clause returnz — M checks that the body M is given a type under 
the assumption that argument x has type A, which is the type of the handled 
expression. The effect € stands for effects that are not handled by the operation 
clauses that follow the return clause and it must be a subset of the effect e’ that 
M may cause.” A handler having operation clauses is typechecked by (THS_OP), 
which checks that the body of the operation clause op(z) — M for op of signature 
Va.C — Distyped at the result type B, which is the same as the type of the return 
clause, under the typing context extended with fresh assigned type variables œ and 
argument z of type C, together with the resumption type (a, x : C, D >€ B). The 
effect € W {op} in the conclusion means that the effect operation op is handled by 
this clause and no other clauses (in the present handler) handle it. Our semantics 
adopts deep handlers [13], i.e., when a handled expression invokes an effect oper- 
ation, the continuation, which passed to the operation clause, is wrapped by the 
same handler. Thus, resumption may invoke the same effect €’ as the one possibly 
invoked by the clauses of the handler, hence D —€’ B in the resumption type. 

Finally, we show how the type system rejects the counterexample given in 
Sect. 2. The problem is in the following operation clause. 


op(y) — resume Az.(resume Az2.21); 21 


where op has effect signature Va.unit > (a — () a). This clause is typechecked 
under resumption type (a, y: unit, — € a) for some €. By (TS_RESUME), the 
two resumption expressions are assigned two different type variables 7; and 
72, and the arguments Az,.(resume àz2.21); 21 and Az».z, are required to have 
Jı >€ Jı and 72 — € 72, respectively. However, 22.2, cannot because 21 is 
associated with yı but not with ye. 


Remark. The rule (TS_RESUME) allows only the type of the argument to an 
operation clause to be renamed. Thus, other variables bound by, e.g., lambda 
abstractions and let-expressions outside the resumption expression cannot be 
used as such a type. As a result, more care may be required as to where to 
introduce a new variable. For example, let us consider the following operation 
clause (which is a variant of the example of choose” above). 


choose’ (x) — let y = fst x in resume y 


The variable x is assigned a x a first and the resumption requires y to be typed 
at fresh type variable 3. This clause would be rejected in the current type system 


5 Thus, handlers in A! are open [13] in the sense that a handle-with expression does 
not have to handle all effects caused by the handled expression. 
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because fst appears outside resume and, therefore, y is given type a, not 8. 
This inconvenience may be addressed by moving down the let-binding in some 
cases: e.g., resume (let y = fst x in y) is well typed. 


4 Intermediate Language: x 

The semantics of A! is given by a formal elaboration to an intermediate lan- 
guage A4, wherein type abstraction and type application appear explicitly. We 
define the syntax, operational semantics, and type system of Al% and the for- 
mal elaboration from Alt to A4,. Finally, we show type safety of Als via type 
preservation of the elaboration and type soundness of Af. 


Values v n= @]Aze 
Polymorphic values w = v|Aa.w 
Terms e n= TA ]|c]|Az.e |e e | letz = Aa.e in e | 


#op(A, e) | #op(a, w, E) | handle e with h | 
resume a z.e 
Handlers h x= returnz > e | h; Aa.op(x) > e 
Evaluation contexts E% = [] (ifa’ = Ø)| E” e | v E~” | 
letz = Ag” .EY” in e2 (if a’ = 6”, y”) | 
#op( A7, B®’) | handle E% with h 


Fig. 3. Syntax of Af. 


4.1 Syntax 


The syntax of AJ, is shown in Fig.3. Values, denoted by v, consist of con- 
stants and lambda abstractions. Polymorphic values, denoted by w, are values 
abstracted over types. Terms, denoted by e, and handlers, denoted by h, are the 
same as those of Alst except for the following three points. First, type abstrac- 
tion and type arguments are explicit in \4;: variables and effect invocations 
are accompanied by a sequence of types and let-bound expressions, resumption 
expressions, and operation clauses bind type variables. Second, a new term con- 
structor of the form #op(a, w, E) is added. It represents an intermediate state in 
which an effect invocation is capturing the continuation up to the closest han- 
dler for op. Here, Æ is an evaluation context [6] and denotes a continuation to 
be resumed by an operation clause handling op. In the operational semantics, 
an operation invocation #op(A, v) is first transformed to #op( A, v,[]) (where [] 
denotes the empty context or the identity continuation) and then it bubbles up 
by capturing its context and pushing it onto the third argument. Note that o 
and w of #op(o, w, E) become polymorphic when it bubbles up from the body 
of a type abstraction. Third, each resumption expression resume a z.e declares 
distinct (type) variables a and z to denote the (type) argument to an operation 
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Reduction rules e ~~ e 


C1 c2 ~ (c1, c2) (R_ConstT) (Az.e) v ~ e[v/z] (R_BETA) 


handle v with h ~> e[v/z] (R-RETURN) 


let r = Aa.vine ~ e[Aa.v/z] (R_LET) ners" — returnz > e) 


#op(A, v) ~ #op(A, v, []) (R-OP) 


#op(o, w, E) e2 ~ #op(0, w, E e2) (R_OPAPP1) 
vı #op(o, w, E) ~ #op(a, w, vı E) (R-OPAPP2) 
#op'(A',#op(o", w, E)) ~= #op(o", w, #op' (A7, E)) (R-OPOP) 


handle #op(o, w, E) with h ~> #op(a, w, handle E with h) 


(where op ¢ ops(h) (R-OPHANDLE) 


let z = Aa! .#op(o", w, E) in e2 ~> 
#op(Y aœ .0o", Aal .w, letz = Aa! .E in ez (R-OPLET) 


handle #op(Y 87. A1, AB” .v, EP”) with h ~> 
e[handle B®” with h/resume]¥ 8,4 [A"[-L/8"]/a"][o[L/8"]/z] (R-Hannre) 
(where h? = Aa’.op(x) > e 


Evaluation rules €1 — €2 


ea 
——_—_—___ F._ EVAL 
Ele] —> Ele] 


Fig. 4. Semantics of ee 


clause, whereas a single variable declared at op(z) — M and implicit type vari- 
ables are used for the same purpose in ALt. For example, the A! operation clause 
choose” (x) — resume (fst x) is translated to Aa.choose”’ (x) — resume £ y.(fst y). 
This change simplifies the semantics. 

Evaluation contexts, denoted by E®, are standard for the lambda calculus 
with call-by-value, left-to-right evaluation except for two points. First, they con- 
tain the form letz = Aa.E* in e2, which allows the body of a type abstrac- 
tion to be evaluated. Second, the metavariable Æ for evaluation contexts is 
indexed by type variables a, meaning that the hole in the context appears under 
type abstractions binding a. For example, letz = Aa.lety = AG.[] in eg in e 
is denoted by E®® and, more generally, letz = A@”.EY” ine is denoted by 
BB”? (Here, B7, y”? stands for the concatenation of the two sequences 37! 
and 72.) If a is not important, we simply write E for E®. We often use the term 
“continuation” to mean “evaluation context,” especially when it is expected to 
be resumed. 


Handling Polymorphic Algebraic Effects 369 


As usual, substitution e[w/z] of w for x in e is defined in a capture-avoiding 
manner. Since variables come along with type arguments, the case for variables 
is defined as follows: 


def 


(x A)[Aa.u/z] = v[A/al] 


Application of substitution [Aa!.v/z] to x A’, where I # J, is undefined. We 
define free type variables ftv(e) and ftv(£) in e and E, respectively, as usual. 


4.2 Semantics 


The semantics of A4 is given in the small-step style and consists of two relations: 
the reduction relation ~», which is for basic computation, and the evaluation 
relation —>, which is for top-level execution. Figure 4 shows the rules for these 
relations. In what follows, we write h''™ for the return clause of handler h, 
ops(h) for the set of effect operations handled by h, and h°? for the operation 
clause for op in h. 

Most of the reduction rules are standard [13,16]. A constant application 
C1 C2 reduces to Ç(c1, c2) (R-CONST), where function ¢ maps a pair of con- 
stants to another constant. A function application (Az.e) v and a let-expression 
letz = Aa.vine reduce to e[v/z] (R-BETA) and e[Aa.v/z] (R_LET), respec- 
tively. If a handled expression is a value v, the handle-with expression reduces 
to the body of the return clause where v is substituted for the parameter 
x (R_RETuRN). An effect invocation #op(A,v) reduces to #op(A, v, []) with 
the identity continuation, as explained above (R_OP); the process of captur- 
ing its evaluation context is expressed by the rules (R-OPAPP1), (R-OPAPP2), 
(R-OPOP), (R-OPHANDLE), and (R-OPLET). The rule (R-OPHANDLE) can be 
applied only if the handler h does not handle op. The rule (R_OPLET) is applied 
to a let-expression where #op(a7, w, E) appears under a type abstraction with 
bound type variables a/. Since ø” and w may refer to a’, the reduction result 
binds a! in both ø” and w. We write Va‘.o7 for a sequence Val.oj,, Fasi 
Va!.a;, of type schemes (where J = {j1,...,jn}). 

The crux of the semantics is (R-HANDLE): it is applied when #op(o/, w, E) 
reaches the handler h that handles op. Since the handled term #op(a", w, E) is 
constructed from an effect invocation #op(A‘, v), if the captured continuation 
E binds type variables 37, the same type variables 87 should have been added 
to A’ and v along the capture. Thus, the handled expression on the left-hand 
side of the rule takes the form #op(V B’..A’, ABT v, EÊ”) (with the same type 
variables 87). 

The right-hand side of (R-HANDLE) involves three types of substitution: 


` 2 J I 
continuation substitution [handle £9” with h/resume] 5's 


substitution for a’, and value substitution for x. We explain them one by one 
J 
below. In the following, let h? = Aa!.op(x) > e and E’? = handle EÊ” with h. 


for resumptions, type 
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Continuation Substitution. Let us start with a simple case where the sequence 37 
is empty. Intuitively, continuation substitution [E’/resu me|4’ replaces a resump- 
tion expression resume y? z.e’ in the body e with E’[v’], where v’ is the value 
of e’, and substitutes A’ and v (arguments to the invocation of op) for y7 and 
z, respectively. Therefore, assuming resume does not appear in e’, we define 
(resume! z.e!)[E’ /resume] 4’ to be lety = e'[A’/y"][v/z] in E’[y] (for fresh y). 
Note that the evaluation of e’ takes place outside of E so that an invocation of 
an effect in e’ is not handled by handlers in Æ. When 3” is not empty, 

Vv 8B7.A! def 

ABT v 


let y = AY. e'[A’/y'][v/2] in B® [y B7] . 


(resume 7” z.e’)[E” /resume] 


(The differences from the simple case are shaded.) The idea is to bind 87 that 

appear free in A! and v by type abstraction at let and to instantiate with the 

same variables at y 3”, where 3/7 are bound by type abstractions in EP”. 
Continuation substitution is formally defined as follows: 


Definition 2 (Continuation substitution). Substitution of continuation 
é S J I 

EP” for resumptions in e, written elE®" /resume] 37" , is defined in a capture- 

avoiding manner, as follows (we describe only the important cases): 


(resume q“ z.e) [B® /resume]y5 4 = 
let y = AB” .e{E®" /resume]\3 4 [A] /¥][u/z] i Re’ ly 87] 


(if (ftu(e) U ftu( EP Y) N {87} = 0 and y is fresh) 
(return z — e)[E /resume]g, f ketur z > e|E /resume]g, 
(h'; Ay? .op(x) = e)[E/resume]?,, 2 h![E/resume]®, ; Ay! .op(«) >e 


The second and third clauses (for a handler) mean that continuation substitution 
is applied only to return clauses. 


Type and Value Substitution. The type and value substitutions A [L7/87] and 
v| L/B], respectively, in (R-HANDLE) are for (type) parameters in h°? = 
Aa! .op(x) — e. The basic idea is to substitute A’ for @! and v for c—similarly 
to continuation substitution. We erase free type variables 3’ in A’ and v by 
substituting the designated base type L for all of them. (We write A? [L7/87] 
and v[L7/B7] for the types and value, respectively, after the erasure.) 

The evaluation rule is ordinary: Evaluation of a term proceeds by reducing 
a subterm under an evaluation context. 


4.3 Type System 


The type system of AGS is similar to that of \'st and has five judgments: well- 
formedness of typing contexts F I’; well formedness of type schemes I” g; term 
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typing judgment I';r F e : Ale; handler typing judgment P;r F h: Ale > 
B |e; and continuation typing judgment [+ E : Va.A — B |e. The first two 
are defined in the same way as those of Alt. The last judgment means that a 
term obtained by filling the hole of Æ with a term having A under T, œ is typed 
at B under I’ and possibly involves effect e. A resumption type r is similar to 
R but does not contain an argument variable. 


Definition 3 (Resumption type). Resumption types in Ap denoted by r, 
are defined as follows: 


r ::= none | (a, A, B >e C) 
(if ftv(A) U ftu(B) C {a} and ftu(C) N {a} = 4) 


Typing rules 


T;rtbe:Ale 


Fr ¢:VaAeTl PEB awa Fr T_Cons 
= ee ee NST 
T;r +} aB: A[B/alle i T;r tc: ty(c)le 
T,z:A;rte:Bleé 
T;rtaAr.e: Ae Ble 


T_ABS 


T;r-}ea:Ave Ble Ty3rte:Ale & Ce 
T;r-eae:Ble 


ty (op) = Ya.A=> B opEe Ijrte:A[C/aljle PFC 


T_APP 


Por F #oplC, 0) SiC alle a 
ty (op) = Ya’. A= B opee FEVB!.C' 
T,p’;r Hv : AlCl/al]|e TE EP :¥B".(B[C'/a!]) — D 
girto Ae jalle Ph BM :vB!(BIC!/a) ~Dlé i sgan 
D; r H #op(Y B7.C’, AB’.v, EB’) : D]e 
T;rte:Alé eS ia 
T;rte:Ale une 
T;rbte:Ale PyrthiAle> Ble Ha 
-HANDLE 
I;r F handle ewithh : B |€ 
Tra;srte:Ale I,a:Va.A;r be: Ble 
- T_LET 
T;r F lets = Aq.er ines : Ble 
Er I,B,2:A ;(a, A,B C)Fre:B ! Ce 
a p, 2: A[B/a);(a,4,B >e 0) E e :BiB/allé ESE a Roson 


I; (a, A,B >e C) F resume 8 z.e : © |e 


Fig. 5. Typing rules for terms in Afr. 
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The typing rules for terms, shown in Fig. 5, and handlers, shown in the upper 
half of Fig.6, are similar to those of Al% except for a new rule (T_OPConrT), 
which is applied to an effect invocation #op(V B!.C!, AB! v, E8”) with a con- 
tinuation. Let ty (op) = Va‘.A — B. Since op should have been invoked with 
C! and v under type abstractions with bound type variables 87, the argument 
v has type A[C a /a!] under the typing context extended with B7. Similarly, the 
hole of £8” expects to be filled with the result of the invocation, i.e., a value of 
B[C"/a"). Since the continuation denotes the context before the evaluation, its 
result type matches with the type of the whole term. 

The typing rules for continuations are shown in the lower half of Fig. 6. They 
are similar to the corresponding typing rules for terms except that a subterm is 
replaced with a continuation. In (TE_LET), the continuation letz = Aa.E ine 
has type Va.o — B because the hole of E appears inside the scope of a. 


Tyr h:Ale>Bleé 


T,z:A;rte:Blé eCe 
Tyr Freturna > e:Ale> Ble 
T;r+h:Ale> Ble 
ty (op) = Va.C 36D T,a,2:C;(a,C,D—->e B) Fe: Ble 
T;r H h; Aa.op(xz) +e: Alew {op} > Ble’ 


TH_RETURN 


TH_Op 


TFE:c0—-Ale 


TFAA HE 


TH E:o—(A—>e B)|e T;none F e :Aje & Ce 


—— TE_App1 

TF Ee:0o— B]e ój 

I; none F y : (A >e B)|e THRHE:0o— Aje Ser 3 
ThruE:a0-Ble ere 


ty (op) = Va. AWB opee FFE:0~AlC/alle PFC 
I+ #op(C, E) : 0 — B[C/al] |e 
[ThtE:o-0-Ale I;nonebh: Ale> Ble 
I F handle E with h : o — Ble’ 


TE_OP 


TE_HANDLE 


TKE:c0-~Alé & Ce 


TE_ 
ThLE:a0—-Ale WERE 


TI,atEB:a0-Ale I,x:Va.A;nonete: Ble 


TE letz = 4a.E ine : Ya. — Ble TE_LET 


Fig. 6. Typing rules for handlers and continuations in Af. 
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4.4 Elaboration 


This section defines the elaboration from Alt to A4. The important difference 
between the two languages from the viewpoint of elaboration is that, whereas 
the parameter of an operation clause is referred to by a single variable in AS 
it is done by one or more variables in ra. Therefore, one variable in Ae is 
represented by multiple variables (required for each resume) in Mes We use S, 
a mapping from variables to variables, to make the correspondence between 
variable names. We write S o {x> y} for the same mapping as S except that x 
is mapped to y. 

Elaboration is defined by two judgments: term elaboration judgment I’; RF 
M : Ale &* e, which denotes elaboration from a typing derivation of judg- 
ment T; RA M : Ale to e with S, and handler elaboration judgment T; R + 
H : A|e + B|e >S h, which denotes elaboration from a typing derivation of 
judgment I; RFH H : A|e = Ble to h with S. 


Term elaboration rules T;REM:Alep® e 


Fr «:VaAer Its 


Pape ABaren 


P,c:A;REM: Ble pSetrr e 
r; R} Aàz.M : Ae Ble Axe 
P;REM:Alep®e P;REH: Ale SBle oth 
T; Rt handle M with H : B |e >% handle e with h 
T,a;RtM,:Aleo% ea S’ = So {ra} 
T,2:Va.A;Rt M: Bleo™ ez 
D; RF letz = Mi in Mz: Ble > letz = Aa.e; in e2 


ELAB_ABS 


ELAB_HANDLE 


ELAB_LET 


R=(a,r2:A,B>€C) +M,2:D,In aeM «Ce 
y is fresh S = So {xrm y} 
Dı, T>,B, xz: A|B/a]; RF M : B[8B/a] |d >” e 
I,,2:D,I2;R resume M : C |e >S resume 8 y.e 


ELAB_RESUME 


Handler elaboration rules |I[;Rt H:Ale > Ble oh 


P,2:A;REM: Bie pSet >t e eCe 
T; RF returns > M : Ale > Ble’ >S returnz > e 


ELABH_RETURN 


ty (op) = Va.C GD T;RHH:Ale>Ble orn 
T,a,t:C;(a,2:C,D >e B)EM: Ble pee e 
I; R- H;op(z) > M: Alew {op} > Ble’ >s h; Aa.op(r) > 


ELABH_Op 
e 


Fig. 7. Elaboration rules (excerpt). 
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Selected elaboration rules are shown in Fig. 7; the complete set of the rules is 
found in the full version of the paper. The elaboration rules are straightforward 
except for the use of S. A variable x is translated to S(x) (ELAB_VAR) and, 
every time a new variable is introduced, S is extended: see the rules other than 
(ELAB_VAR) and (ELAB_HANDLE). 


4.5 Properties 


We show type safety of A$, i.e., a well-typed program in Alt does not get stuck, 


by proving (1) type preservation of the elaboration from Ast to A4, and (2) 
type soundness of Aaa Term M is a well-typed program of A if and only if 
0; none M: A| Q. 

The first can be shown easily. We write Ø also for the identity mapping for 
variables. 


Theorem 1 (Elaboration is type-preserving). If M is a well-typed program 
of A, then 0;none+ M : A|() >’ e and @;none E e : A|() for some e. 


We show the second—type soundness of \4,—via progress and subject reduc- 
tion [25]. We write A for a typing context that consists only of type variables. 
Progress can be shown as usual. 


Lemma 1 (Progress). If A;none | e : Ale, then (1) e — e’ for some e’, 
(2) e is a value, or (3) e = #op(o,w, E) for some op € €, a, w, and E. 


A key lemma to show subject reduction is type preservation of continuation 
substitution. 


Lemma 2 (Continuation substitution). Suppose that P+ VB’.C! and 
TH EP” :¥ B!.(B[C!/a!]) — Die and I, 8% + v : A[C™/a]. 


1. If [;(a',A,B ae D) te: D'|e, then T; none + [B® /resume]’\3 S i 
D' |e. 
2. If T;(a@',A,B => e D) F h : Dja => Dele, then P;none + 


, Pat 
h{ BP" /resume] 57 : Dı |e, => Do |e. 


Using the continuation substitution lemma as well as other lemmas, we show 
subject reduction. 


Lemma 3 (Subject reduction) 


1. If A;none F e : Ale and e ~> e2, then A;none F ez : Ale. 
2. If A;none F e : Ale and e — eg, then A;none F eg : Ale. 


We write e — > if and only if e cannot evaluate further. Moreover, —>* 
denotes the reflexive and transitive closure of the evaluation relation —>. 
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Theorem 2 (Type soundness of àf). If A;none F e : Ale and e —* e' 
and e' >, then (1) e’ is a value or (2) e' = #op(o,w, E) for some op € €, ©, 
w, and E. 


Now, type safety of A|% is obtained as a corollary of Theorems 1 and 2. 
Corollary 1 (Type safety of \'). If M is a well-typed program of A, there 
exists some e such that Ø; none M : A|() D’ e and e does not get stuck. 


5 Related Work 


5.1 Polymorphic Effects and Let-Polymorphism 


Many researchers have attacked the problem of combining effects—not necessar- 
ily algebraic—and let-polymorphism so far [1,2,10,12,14,17,23,24]. In particu- 
lar, most of them have focused on ML-style polymorphic references. The alge- 
braic effect handlers dealt with in this paper seem to be unable to implement 
general ML-style references—i.e., give an appropriate implementation to a set of 
effect operations new with the signature Va.a — aref, get with Va.aref > a, 
and put with Va.a x a ref —> unit for abstract datatype a ref—even without the 
restriction on handlers because each operation clause in a handler assigns type 
variables locally and it is impossible to share such type variables between oper- 
ation clauses. Nevertheless, their approaches would be applicable to algebraic 
effects and handlers. 

A common idea in the literature is to restrict the form of expressions bound 
by polymorphic let. Thus, they are complementary to our approach in that they 
restrict how effect operations are used whereas we restrict how effect operations 
are implemented. 

Value restriction [23,24], a standard way adopted in ML-like languages, 
restricts polymorphic let-bound expressions to syntactic values. Garrigue [10] 
relaxes the value restriction so that, if a let-bound expression is not a syntactic 
value, type variables that appear only at positive positions in the type of the 
expression can be generalized. Although the (relaxed) value restriction is a quite 
clear criterion that indicates what let-bound expressions can be polymorphic 
safely and it even accepts interfering handlers, it is too restrictive in some cases. 
We give an example for such a case below. 


Yo>Vaaxaca 


effect choose 
let f1 () = 
let g = #choose’ (fst, snd) in 
if g (true,false) then g (-1,1) else g (1,-1) 


6 One possible approach to dealing with ML-style references is to extend algebraic 
effects and handlers so that a handler for parameterized effects can be connected 
with dynamic resources [3]. 
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In the definition of function f1, variable g is used polymorphically. Execution 
of this function under an appropriate handler would succeed, and in fact our 
calculus accepts it. By contrast, the (relaxed) value restriction rejects it because 
the let-bound expression #choose” (fst,snd) is not a syntactic value and the 
type variable appear in both positive and negative positions, and so g is assigned 
a monomorphic type. A workaround for this problem is to make a function 
wrapper that calls either of fst or snd depending on the Boolean value chosen 
by choose’: 


let f2 © = 
let b = #choose’(true,false) in 
let g = Ax. if b then (fst x) else (snd x) in 
if g (true,false) then g (-1,1) else g (1,-1) 


However, this workaround makes the program complicated and incurs additional 
run-time cost for the branching and an extra call to the wrapper function. 

Asai and Kameyama [2] study a combination of let-polymorphism with delim- 
ited control operators shift/reset [4]. They allow a let-bound expression to be 
polymorphic if it invokes no control operation. Thus, the function f1 above would 
be rejected in their approach. 

Another research line to restrict the use of effects is to allow only type vari- 
ables unrelated to effect invocations to be generalized. Tofte [23] distinguishes 
between applicative type variables, which cannot be used for effect invocations, 
and imperative ones, which can be used, and proposes a type system that enforces 
restrictions that (1) type variables of imperative operations can be instantiated 
only with types wherein all type variables are imperative and (2) if a let-bound 
expression is not a syntactic value, only applicative type variables can be gener- 
alized. Leroy and Weis [17] allow generalization only of type variables that do not 
appear in a parameter type to the reference type in the type of a let-expression. 
To detect the hidden use of references, their type system gives a term not only 
a type but also the types of free variables used in the term. Standard ML of 
New Jersey (before ML97) adopted weak polymorphism [1], which was later 
formalized and investigated deeply by Hoang et al. [12]. Weak polymorphism 
equips a type variable with the number of function calls after which a value of a 
type containing the type variable will be passed to an imperative operation. The 
type system ensures that type variables with positive numbers are not related to 
imperative constructs, and so such type variables can be generalized safely. In 
this line of research, the function f1 above would not typecheck because general- 
ized type variables are used to instantiate those of the effect signature, although 
it could be rewritten to an acceptable one by taking care not to involve type 
variables in effect invocation. 


let £3 () = 
let g = if #choose’(true,false) then fst then snd in 
if g (true,false) then g (-1,1) else g (1,-1) 


More recently, Kammar and Pretnar [14] show that parameterized algebraic 
effects and handlers do not need the value restriction if the type variables used 
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in an effect invocation are not generalized. Thus, as the other work that restricts 
generalized type variables, their approach would reject function f1 but would 
accept f3. 


5.2 Algebraic Effects and Handlers 


Algebraic effects [20] are a way to represent the denotation of an effect by giv- 
ing a set of operations and an equational theory that capture their properties. 
Algebraic effect handlers, introduced by Plotkin and Pretnar [21], make it pos- 
sible to provide user-defined effects. Algebraic effect handlers have been gaining 
popularity owing to their flexibility and have been made available as libraries 
[13,15,26] or as primitive features of languages, such as Eff [3], Koka [16], Frank 
[18], and Multicore OCaml [5]. In these languages, let-bound expressions that 
can be polymorphic are restricted to values or pure expressions. 

Recently, Forster et al. [9] investigate the relationships between algebraic 
effect handlers and other mechanisms for user-defined effects—delimited control 
shiftO [19] and monadic reflection [7,8]—conjecturing that there would be no 
type-preserving translation from a language with delimited control or monadic 
reflection to one with algebraic effect handlers. It would be an interesting direc- 
tion to export our idea to delimited control and monadic reflection. 


6 Conclusion 


There has been a long history of collaboration between effects and let- 
polymorphism. This work focuses on polymorphic algebraic effects and handlers, 
wherein the type signature of an effect operation can be polymorphic and an 
operation clause has a type binder, and shows that a naive combination of poly- 
morphic effects and let-polymorphism breaks type safety. Our novel observation 
to address this problem is that any let-bound expression can be polymorphic 
safely if resumptions from a handler do not interfere with each other. We for- 
malized this idea by developing a type system that requires the argument of 
each resumption expression to have a type obtained by renaming the type vari- 
ables assigned in the operation clause to those assigned in the resumption. We 
have proven that a well-typed program in our type system does not get stuck 
via elaboration to an intermediate language wherein type information appears 
explicitly. 

There are many directions for future work. The first is to address the prob- 
lem, described at the end of Sect. 3, that renaming the type variables assigned in 
an operation clause to those assigned in a resumption expression is allowed for 
the argument of the clause but not for variables bound by lambda abstractions 
and let-expressions outside the resumption expression. Second, we are interested 
in incorporating other features from the literature on algebraic effect handlers, 
such as dynamic resources [3] and parameterized algebraic effects, and restriction 
techniques that have been developed for type-safe imperative programming with 
let-polymorphism such as (relaxed) value restriction [10,23,24]. For example, we 
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would like to develop a type system that enforces the non-interfering restriction 
only to handlers implementing effect operations invoked in polymorphic compu- 
tation. We also expect that it is possible to determine whether implementations 
of an effect operation have no interfering resumption from the type signature of 
the operation, as relaxed value restriction makes it possible to find safely gener- 
alizable type variables from the type of a let-bound expression [10]. Finally, we 
are also interested in implementing our idea for a language with effect handlers 
such as Koka [16] and in applying the idea of analyzing handlers to other settings 
such as dependent typing. 
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Abstract. Popular programming techniques such as shallow embeddings 
of Domain Specific Languages (DSLs), finally tagless or object algebras 
are built on the principle of compositionality. However, existing program- 
ming languages only support simple compositional designs well, and have 
limited support for more sophisticated ones. 

This paper presents the Fr calculus, which supports highly modular 
and compositional designs that improve on existing techniques. These 
improvements are due to the combination of three features: disjoint inter- 
section types with a merge operator; parametric (disjoint) polymorphism; 
and BCD-style distributive subtyping. The main technical challenge is 
F}’s proof of coherence. A naive adaptation of ideas used in System F’s 
parametricity to canonicity (the logical relation used by FY to prove 
coherence) results in an ill-founded logical relation. To solve the problem 
our canonicity relation employs a different technique based on immedi- 
ate substitutions and a restriction to predicative instantiations. Besides 
coherence, we show several other important meta-theoretical results, such 
as type-safety, sound and complete algorithmic subtyping, and decidabil- 
ity of the type system. Remarkably, unlike F<:’s bounded polymorphism, 
disjoint polymorphism in Ft supports decidable type-checking. 


1 Introduction 


Compositionality is a desirable property in programming designs. Broadly 
defined, it is the principle that a system should be built by composing smaller 
subsystems. For instance, in the area of programming languages, composition- 
ality is a key aspect of denotational semantics [48,49], where the denotation 
of a program is constructed from the denotations of its parts. Compositional 
definitions have many benefits. One is ease of reasoning: since compositional 
definitions are recursively defined over smaller elements they can typically be 
reasoned about using induction. Another benefit is that compositional defini- 
tions are easy to extend, without modifying previous definitions. 

Programming techniques that support compositional definitions include: 
shallow embeddings of Domain Specific Languages (DSLs) [20], finally tag- 
less [11], polymorphic embeddings [26] or object algebras [35]. These techniques 
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allow us to create compositional definitions, which are easy to extend with- 
out modifications. Moreover, when modeling semantics, both finally tagless and 
object algebras support multiple interpretations (or denotations) of syntax, thus 
offering a solution to the well-known Expression Problem [53]. Because of these 
benefits these techniques have become popular both in the functional and object- 
oriented programming communities. 

However, programming languages often only support simple compositional 
designs well, while support for more sophisticated compositional designs is lack- 
ing. For instance, once we have multiple interpretations of syntax, we may wish 
to compose them. Particularly useful is a merge combinator, which composes 
two interpretations [35,37,42] to form a new interpretation that, when executed, 
returns the results of both interpretations. 

The merge combinator can be manually defined in existing programming 
languages, and be used in combination with techniques such as finally tagless or 
object algebras. Moreover variants of the merge combinator are useful to model 
more complex combinations of interpretations. A good example are so-called 
dependent interpretations, where an interpretation does not depend only on 
itself, but also on a different interpretation. These definitions with dependencies 
are quite common in practice, and, although they are not orthogonal to the 
interpretation they depend on, we would like to model them (and also mutually 
dependent interpretations) in a modular and compositional style. 

Defining the merge combinator in existing programming languages is verbose 
and cumbersome, requiring code for every new kind of syntax. Yet, that code 
is essentially mechanical and ought to be automated. While using advanced 
meta-programming techniques enables automating the merge combinator to a 
large extent in existing programming languages [37,42], those techniques have 
several problems: error messages can be problematic, type-unsafe reflection is 
needed in some approaches [37] and advanced type-level features are required 
in others [42]. An alternative to the merge combinator that supports modular 
multiple interpretations and works in OO languages with support for some form 
of multiple inheritance and covariant type-refinement of fields has also been 
recently proposed [55]. While this approach is relatively simple, it still requires 
a lot of manual boilerplate code for composition of interpretations. 

This paper presents a calculus and polymorphic type system with (disjoint) 
intersection types [36], called FF. FF supports our broader notion of compo- 
sitional designs, and enables the development of highly modular and reusable 
programs. F7 has a built-in merge operator and a powerful subtyping relation 
that are used to automate the composition of multiple (possibly dependent) 
interpretations. In Fr subtyping is coercive and enables the automatic gener- 
ation of coercions in a type-directed fashion. This process is similar to that of 
other type-directed code generation mechanisms such as type classes [52], which 
eliminate boilerplate code associated to the dictionary translation [52]. 

F7 continues a line of research on disjoint intersection types. Previous work on 
disjoint polymorphism (the F; calculus) [2] studied the combination of parametric 
polymorphism and disjoint intersection types, but its subtyping relation does 
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not support BCD-style distributivity rules [3] and the type system also prevents 
unrestricted intersections [16]. More recently the NeColus calculus (or Af) [5] 
introduced a system with disjoint intersection types and BCD-style distributivity 
rules, but did not account for parametric polymorphism. EY is unique in that it 
combines all three features in a single calculus: disjoint intersection types and a 
merge operator; parametric (disjoint) polymorphism; and a BCD-style subtyping 
relation with distributivity rules. The three features together allow us to improve 
upon the finally tagless and object algebra approaches and support advanced 
compositional designs. Moreover previous work on disjoint intersection types 
has shown various other applications that are also possible in F7, including: 
first-class traits and dynamic inheritance [4], extensible records and dynamic 
mirins [2], and nested composition and family polymorphism [5]. 

Unfortunately the combination of the three features has non-trivial compli- 
cations. The main technical challenge (like for most other calculi with disjoint 
intersection types) is the proof of coherence for F7. Because of the presence 
of BCD-style distributivity rules, our coherence proof is based on the recent 
approach employed in Af [5], which uses a heterogeneous logical relation called 
canonicity. To account for polymorphism, which dis canonicity does not sup- 
port, we originally wanted to incorporate the relevant parts of System F’s logical 
relation [43]. However, due to a mismatch between the two relations, this did 
not work. The parametricity relation has been carefully set up with a delayed 
type substitution to avoid ill-foundedness due to its impredicative polymorphism. 
Unfortunately, canonicity is a heterogeneous relation and needs to account for 
cases that cannot be expressed with the delayed substitution setup of the homo- 
geneous parametricity relation. Therefore, to handle those heterogeneous cases, 
we resorted to immediate substitutions and predicative instantiations. We do not 
believe that predicativity is a severe restriction in practice, since many source 
languages (e.g., those based on the Hindley-Milner type system like Haskell and 
OCaml) are themselves predicative and do not require the full generality of an 
impredicative core language. Should impredicative instantiation be required, we 
expect that step-indexing [1] can be used to recover well-foundedness, though at 
the cost of a much more complicated coherence proof. 

The formalization and metatheory of Ff are a significant advance over that 
of F;. Besides the support for distributive subtyping, FF removes several restric- 
tions imposed by the syntactic coherence proof in F;. In particular FF supports 
unrestricted intersections, which are forbidden in F;. Unrestricted intersections 
enable, for example, encoding certain forms of bounded quantification [39]. More- 
over the new proof method is more robust with respect to language extensions. 
For instance, F? supports the bottom type without significant complications in 
the proofs, while it was a challenging open problem in F,. A final interesting 
aspect is that F7 ’s type-checking is decidable. In the design space of languages 
with polymorphism and subtyping, similar mechanisms have been known to 
lead to undecidability. Pierce’s seminal paper “Bounded quantification is unde- 
cidable” [40] shows that the contravariant subtyping rule for bounded quantifi- 
cation in Fz, leads to undecidability of subtyping. In F7 the contravariant rule 
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for disjoint quantification retains decidability. Since with unrestricted intersec- 
tions F+ can express several use cases of bounded quantification, F+ could be 
an interesting and decidable alternative to F<.. 

In summary the contributions of this paper are: 


— The FF calculus, which is the first calculus to combine disjoint intersection 
types, BCD-style distributive subtyping and disjoint polymorphism. We show 
several meta-theoretical results, such as type-safety, sound and complete algo- 
rithmic subtyping, coherence and decidability of the type system. pr includes 
the bottom type, which was considered to be a significant challenge in previous 
work on disjoint polymorphism [2]. 

— An extension of the canonicity relation with polymorphism, which 
enables the proof of coherence of F}. We show that the ideas of System F’s 
parametricity cannot be ported to ES To overcome the problem we use a 
technique based on immediate substitutions and a predicativity restriction. 

— Improved compositional designs: We show that F's combination of fea- 
tures enables improved compositional programming designs and supports 
automated composition of interpretations in programming techniques like 
object algebras and finally tagless. 

— Implementation and proofs: All of the metatheory of this paper, except 
some manual proofs of decidability, has been mechanically formalized in Coq. 
Furthermore, Fr is implemented and all code presented in the paper is avail- 
able. The implementation, Coq proofs and extended version with appendices 
can be found in https://github.com/bixuanzju/ESOP2019-artifact. 


2 Compositional Programming 


To demonstrate the compositional properties of F} we use Gibbons and Wu’s 
shallow embeddings of parallel prefix circuits [20]. By means of several different 
shallow embeddings, we first illustrate the short-comings of a state-of-the-art 
compositional approach, popularly known as a finally tagless encoding [11], in 
Haskell. Next we show how parametric polymorphism and distributive intersec- 
tion types provide a more elegant and compact solution in SEDEL [4], a source 
language built on top of our Ff calculus. 


2.1 A Finally Tagless Encoding in Haskell 


The circuit DSL represents networks that map a number of inputs (known as the 
width) of some type A onto the same number of outputs of the same type. The 
outputs combine (with repetitions) one or more inputs using a binary associative 
operator 6: Ax A— A. A particularly interesting class of circuits that can be 
expressed in the DSL are parallel prefix circuits. These represent computations 
that take n > 0 inputs x,...,%, and produce n outputs yj,...,Yn, where 
Yi = zı Drt: P... xi. 

The DSL features 5 language primitives: two basic circuit constructors and 
three circuit combinators. These are captured in the Haskell type class Circuit: 
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data Width = W { width :: Int } data Depth = D { depth :: Int } 
instance Circuit Width where instance Circuit Depth where 
identity n =Wno identity n =D0O 
fan n =Wno fan n =Di1 
beside c1 c2 = beside ci c2 = 
W (width c1 + width c2) D (max (depth c1) (depth c2)) 
above c1 c2 = c1 above cl c2 =D (depth ci + depth c2) 
stretch ws c = W (sum ws) stretch ws c = c 
(a) Width embedding (b) Depth embedding 


Fig. 1. Two finally tagless embeddings of circuits. 


class Circuit c where 


identity :: Int — c 

fan :: Int —> c 
beside i: C> Coc 
above :: C> cCc 
stretch :: [Int] —> c > c 


An identity circuit with n inputs x;, has n outputs yi = zi. A fan circuit 
has n inputs z; and n outputs y;, where yı = xı and y; = zı @ z; (j > 1). 
The binary beside combinator puts two circuits in parallel; the combined circuit 
takes the inputs of both circuits to the outputs of both circuits. The binary above 
combinator connects the outputs of the first circuit to the inputs of the second; 
the width of both circuits has to be same. Finally, stretch ws c interleaves the 
wires of circuit c with bundles of additional wires that map their input straight 
on their output. The ws parameter specifies the width of the consecutive bundles; 
the ith wire of c is preceded by a bundle of width ws; — 1. 


Basic width and depth embeddings. Figure 1 shows two simple shallow embed- 
dings, which represent a circuit respectively in terms of its width and its depth. 
The former denotes the number of inputs/outputs of a circuit, while the latter 
is the maximal number of ® operators between any input and output. Both 
definitions follow the same setup: a new Haskell datatype (Width/Depth) wraps 
the primitive result value and provides an instance of the Circuit type class 
that interprets the 5 DSL primitives accordingly. The following code creates a 
so-called Brent-Kung parallel prefix circuit [9]: 

el :: Width 

el = above (beside (fan 2) (fan 2)) 

(above (stretch [2, 2] (fan 2)) 
(beside (beside (identity 1) (fan 2)) (identity 1))) 


Here e1 evaluates to W {width = 4}. If we want to know the depth of the circuit, 
we have to change type signature to Depth. 
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Interpreting multiple ways. Fortunately, with the help of polymorphism we can 
define a type of circuits that support multiple interpretations at once. 


type DCircuit = forall c. Circuit c > c 


This way we can provide a single Brent-Kung parallel prefix circuit definition 
that can be reused for different interpretations. 


brentKung :: DCircuit 
brentKung = above (beside (fan 2) (fan 2)) 
(above (stretch [2, 2] (fan 2)) 
(beside (beside (identity 1) (fan 2)) (identity 1))) 


A type annotation then selects the desired interpretation. For instance, 
brentKung :: Width yields the width and brentKung :: Depth the depth. 


Composition of embeddings. What is not ideal in the above code is that the 
same brentKung circuit is processed twice, if we want to execute both interpre- 
tations. We can do better by processing the circuit only once, computing both 
interpretations simultaneously. The finally tagless encoding achieves this with a 
boilerplate instance for tuples of interpretations. 


instance (Circuit ci, Circuit c2) = Circuit (c1, c2) where 
identity n = (identity n, identity n) 
fann (fan n, fan n) 
beside ci c2 (beside (fst c1) (fst c2), beside (snd c1) (snd c2)) 
above c1 c2 (above (fst c1) (fst c2), above (snd c1) (snd c2)) 
stretch ws c = (stretch ws (fst c), stretch ws (snd c)) 


Now we can get both embeddings simultaneously as follows: 


e12 :: (Width, Depth) 
e12 = brentKung 


This evaluates to (W {width = 4}, D {depth = 2}). 


Composition of dependent interpretations. The composition above is easy 
because the two embeddings are orthogonal. In contrast, the composition of 
dependent interpretations is rather cumbersome in the standard finally tagless 
setup. An example of the latter is the interpretation of circuits as their well- 
sizedness, which captures whether circuits are well-formed. This interpretation 
depends on the interpretation of circuits as their width.! 


data WellSized = WS { wS :: Bool, ox :: Width } 

instance Circuit WellSized where 

identity n = WS True (identity n) 

fan n = WS True (fan n) 

beside c1 c2 = WS (wS ci && wS c2) (beside (ox c1) (ox c2)) 


' Dependent recursion schemes are also known as zygomorphism [18] after the ancient 
Greek word GVY9” for yoke. We have labeled the Width field with ox because it is 
pulling the yoke. 
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WS (wS c1 && wS c2 && width (ox c1) == width (ox c2)) 
(above (ox c1) (ox c2)) 
stretch ws c = WS (wS c && length ws==width (ox c)) (stretch ws (ox c)) 


above c1 c2 


The WellSized datatype represents the well-sizedness of a circuit with a Boolean, 
and also keeps track of the circuit’s width. The 5 primitives compute the well- 
sizedness in terms of both the width and well-sizedness of the subcomponents. 
What makes the code cumbersome is that it has to explicitly delegate to the 
Width interpretation to collect this additional information. 

With the help of a substantially more complicated setup that features a 
dozen Haskell language extensions, and advanced programming techniques, we 
can make the explicit delegation implicit (see the appendix). Nevertheless, that 
approach still requires a lot of boilerplate that needs to be repeated for each 
DSL, as well as explicit projections that need to be written in each interpreta- 
tion. Another alternative Haskell encoding that also enables multiple dependent 
interpretations is proposed by Zhang and Oliveira [55], but it does not elimi- 
nate the explicit delegation and still requires substantial amounts of boilerplate. 
A final remark is that adding new primitives (e.g., a “right stretch” rstretch 
combinator [25]) can also be easily achieved [46]. 


2.2 The SEDEL Encoding 


SEDEL is a source language that elaborates to FY, adding a few convenient 
source level constructs. The SEDEL setup of the circuit DSL is similar to the 
finally tagless approach. Instead of a Circuit c type class, there is a Circuit [C] 
type that gathers the 5 circuit primitives in a record. Like in Haskell, the type 


parameter C expresses that the interpretation of circuits is a parameter. 


type Circuit[C] = { 
identity : Int — C, fan : Int — C, beside : C — C —> C, 
above : C — C — C, stretch : List[Int] — C — C }; 


As a side note if a new constructor (e.g., rstretch) is needed, then this is done 
by means of intersection types (& creates an intersection type) in SEDEL: 


type NCircuit[C] = Circuit[C] & { rstretch : List[Int] —> C —> C }; 


Figure 2 shows the two basic shallow embeddings for width and depth. In both 
cases, a named SEDEL definition replaces the corresponding unnamed Haskell 
type class instance in providing the implementations of the 5 language primitives 
for a particular interpretation. 

The use of the SEDEL embeddings is different from that of their Haskell coun- 
terparts. Where Haskell implicitly selects the appropriate type class instance 
based on the available type information, in SEDEL the programmer explicitly 
selects the implementation following the style used by object algebras. The fol- 
lowing code does this by building a circuit with 11 (short for language1). 


11 
el 


language! ; 
li.above (11.beside (11.fan 2) (11.fan 2)) 
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type Width = { width : Int }; 
language1 : Circuit[Width] = { 
identity (n : Int) = { width = n }, 
fan (n : Int) = { width = n }, 
beside (c1 : Width) (c2 : Width) = { width = c1.width + c2.width }, 
above (c1 : Width) (c2 : Width) = { width = cl.width }, 
stretch (ws : List[Int]) (c : Width) = { width = sum ws } }; 


type Depth = { depth : Int }; 

language2 : Circuit[Depth] = { 
identity (n : Int) = { depth = 0 }, 
fan (n : Int) = { depth = 1 }, 
beside (c1 : Depth) (c2 : Depth) = { depth = max c1.depth c2.depth}, 
above (c1 : Depth) (c2 : Depth) = { depth = c1.depth + c2.depth}, 
stretch (ws : List[Int]) (c : Depth) = { depth = c.depth } }; 


Fig. 2. Two SEDEL embeddings of circuits. 


(11.above (11.stretch (cons 2 (cons 2 nil)) (11.fan 2)) 
(11.beside (11.beside (11.identity 1) (1i.fan 2)) (11.identity 1))); 


Here e1 evaluates to {width = 4}. If we want to know the depth of the circuit, 
we have to replicate the code with language2. 


Dynamically reusable circuits. Just like in Haskell, we can use polymorphism to 
define a type of circuits that can be interpreted with different languages. 


type DCircuit = { accept : forall C. Circuit[C] — C }; 


In contrast to the Haskell solution, this implementation explicitly accepts the 
implementation. 


brentKung : DCircuit = { 
accept C 1 = l.above (l1.beside (l.fan 2) (1.fan 2)) 
(l.above (1.stretch (cons 2 (cons 2 nil)) (1.fan 2)) 
(l.beside (1.beside (l.identity 1) (1.fan 2)) (l.identity 1))) }; 
brentKung.accept Width language1; 
brentKung.accept Depth language2; 


el 
e2 


Automatic composition of languages. Of course, like in Haskell we can also com- 
pute both results simultaneously. However, unlike in Haskell, the composition of 
the two interpretation requires no boilerplate whatsoever—in particular, there 
is no SEDEL counterpart of the Circuit (c1, c2) instance. Instead, we can just 
compose the two interpretations with the term-level merge operator (,,) and 
specify the desired type Circuit [Width & Depth]. 


language3 : Circuit[Width & Depth] = languagel ,, language2; 
e3 = brentKung.accept (Width & Depth) language3; 
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Here the use of the merge operator creates a term with the intersection type 
Circuit [Width] & Circuit [Depth]. Implicitly, the SEDEL type system takes care 
of the details, turning this intersection type into Circuit [Width & Depth]. This 
is possible because intersection (&) distributes over function and record types (a 
distinctive feature of BCD-style subtyping). 


Composition of dependent interpretations. In SEDEL the composition scales 
nicely to dependent interpretations. For instance, the well-sizedness interpre- 
tation can be expressed without explicit projections. 


type WellSized = { wS : Bool }; 
language4 = { 
identity (n : Int) = { wS = true }, 
fan (n : Int) = { wS = true }, 
above (c1 : WellSized & Width) (c2 : WellSized & Width) = 
{ WS = cl.wS && c2.wS && cl.width == c2.width }, 
beside (ci : WellSized) (c2 : WellSized) = { wS = c1.wS && c2.wS }, 
stretch (ws : List[Int]) (c : WellSized & Width) 
{ wS = c.wS && length ws == c.width } }; 


Here the WellSized & Width type in the above and stretch cases expresses that 
both the well-sizedness and width of subcircuits must be given, and that the 
width implementation is left as a dependency—when language4 is used, then 
the width implementation must be provided. Again, the distributive properties 
of & in the type system take care of merging the two interpretations. 


e4 brentKung.accept (WellSized & Width) (languagel ,, language4); 
main = e4.wS -- Output: true 


Disjoint polymorphism and dynamic merges. While it may seem from the above 
examples that definitions have to be merged statically, SEDEL in fact supports 
dynamic merges. For instance, we can encapsulate the merge operator in the 
combine function while abstracting over the two components x and y that are 
merged as well as over their types A and B. 


combine A [B * A] (x: A) (y: B) =x ,, y; 


This way the components x and y are only known at runtime and thus the merge 
can only happen at that time. The types A and B cannot be chosen entirely freely. 
For instance, if both components would contribute an implementation for the 
same method, which implementation is provided by the combination would be 
ambiguous. To avoid this problem the two types A and B have to be disjoint. 
This is expressed in the disjointness constraint * A on the quantifier of the type 
variable B. If a quantifier mentions no disjointness constraint, like that of A, it 
defaults to the trivial * T constraint which implies no restriction. 


3 Semantics of the F} Calculus 


This section gives a formal account of Fy, the first typed calculus combining dis- 
joint polymorphism [2] (and disjoint intersection types) with BCD subtyping [3]. 
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Types A,B,C :=Int|T|L|A—> B|A&B|{l: A}|a|V(ax A). B 

Expressions Æ meali|T|AcE| A E| E6E,, | E:A|{l=E}| E.l 
| AlaxA)E|EA 

Term contexts I z=6| Iz: A 

Type contexts A z= 6| Ajax A 


Fig. 3. Syntax of F? 


The main differences to F; are in the subtyping, well-formedness and disjointness 
relations. F} adds BCD subtyping and unrestricted intersections, and also closes 
an open problem of F; by including the bottom type. The dynamic semantics 
of F7 is given by elaboration to the target calculus F..—a variant of System F 
extended with products and explicit coercions. 


3.1 Syntax and Semantics 


Figure 3 shows the syntax of ae Metavariables A, B, C range over types. Types 
include standard constructs from prior work [2,36]: integers Int, the top type T, 
arrows A — B, intersections A & B, single-field record types {1 : A} and disjoint 
quantification V(a * A). B. One novelty in F7 is the addition of the uninhabited 
bottom type L. Metavariable Æ ranges over expressions. Expressions are integer 
literals 7, the top value T, lambda abstractions Az. E, applications FE, E2, merges 
FE, ,, E2, annotated terms E : A, single-field records {1 = E}, record projections 
E.l, type abstractions A(a x A). E and type applications E A. 


Well-formedness and unrestricted intersections. Fg well-formedness judgment 

of types At A is standard, and only enforces well-scoping. This is one of the key 

differences from F;, which uses well-formedness to also ensure that all intersection 

types are disjoint. In other words, while in F; all valid intersection types must 

be disjoint, in Fy unrestricted intersection types such as Int & Int are allowed. 

More specifically, the well-formedness of intersection types in FF and F; is: 
AFA AFB AFA AFB AFAxB 


-Ff wF-F; 
APAGER °° AE A&B 


Notice that F; has an extra disjointness condition A+ Ax B in the premise. This 
is crucial for F;’s syntactic method for proving coherence, but also burdens the 
calculus with various syntactic restrictions and complicates its metatheory. For 
example, it requires extra effort to show that F; only produces disjoint intersec- 
tion types. As a consequence, F; features a weaker substitution lemma (note the 
gray part in Proposition 1) than F/ (Lemma 1). 


Proposition 1 (Type substitution in F;). If AF A, AF B, (ax C) € 4, 
A- BxC and well-formed context |[B/alA, then [B/a]A F [B/alA. 


Lemma 1 (Type substitution in F7). Jf At A, AF B, (ax C) € A and 
well-formed context |[B/a]A, then [B/a]A F [B/alA. 
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A <: B ~ co (Declarative subtyping) 
S-REFL R S-TOP 
. Ag <: A3 ~> C01 Ay <: Ao ~ CO2 
A<: Aw» id A, <: A3 ~> c01 © C02 A<: T ~ top 
S-RCD S-ANDL S-ANDR 
A <: B ~co i 
{1: A} <: {1: B} ~ co Ai & Ag <: Al S TI Ay & Ag <: Ao ~ T2 
S-ARR S-AND 
Bı <: Ai ~ co, Ag <: B2 ~> co2 Ai <: Az ~> cor Aı <: Az ~= C02 
A, — Ao <: Bı — Bo ~> c01 > CO2 Ay <: Az & A3 ~> (c01, c02) 
S-DISTARR S-TOPARR 
(Ai — A2) & (Aı — A3) <: Ay — Ao & A3 ~ dist. T <: T — T ~ top_, 
S-DISTRCD S-TOPRCD S-BOT 
{l: A} &{l: B} <: {1: A& B} ~id T <: {1: T} ~id L <: A ~ bot 
S-FORALL g 
Bı <: B2 ~ co Ag <: Aj a 
Via * Ai). Bı <: Y(a x A2). Bo ~> coy T <: V(a* T). T ~ topy 
S-DISTALL 


(V(a * A). By) & (v(a * A). B2) S v(a * A). Bı & B2 ~ disty 


Fig. 4. Declarative subtyping 


Declarative subtyping. F*’s subtyping judgment is another major difference to 
F;, because it features BCD-style subtyping and a rule for the bottom type. 
The full set of subtyping rules are shown in Fig.4. The reader is advised to 
ignore the gray parts for now. Our subtyping rules extend the BCD-style sub- 
typing rules from Af [5] with a rule for parametric (disjoint) polymorphism (rule 
S-FORALL). Moreover, we have three new rules: rule S-BoT for the bottom type, 
and rules S-DISTALL and S-TOPALL for distributivity of disjoint quantification. 
The subtyping relation is a partial order (rules S-REFL and S-TRANS). Most 
of the rules are quite standard. L is a subtype of all types (rule S-BoT). Sub- 
typing of disjoint quantification is covariant in its body, and contravariant in 
its disjointness constraints (rule S-FORALL). Of particular interest are those so- 
called “distributivity” rules: rule S-pISTARR says intersections distribute over 
arrows; rule S-DISTRCD says intersections distribute over records. Similarly, rule 
S-DISTALL dictates that intersections may distribute over disjoint quantifiers. 
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A;TrFE>Am~e (Inference) 
T-TOP T-NAT T-VAR 
rA Arr FA APD FA AFF (œ:A)eEr 
A;PFT => T ~~ () A;P ris Int wi ATF r> AT 
T-MERGE 
T-APP 
A; T F Fy > Ai > A ~w e 4A; rF ky > Awra 
ATF Eo = AS ep A; IF Eo => Ao ~ e2 AF A, * Ao 
A; I+ Fy Eo => Az ~ e6 e2 4A; TF E ,, Bo => Ai & Ao ~ (e1, €2) 
T-ANNO T-RCD 
A rFE Awe Ar rFEs> Awe 
A;TFE:A>An~e A;Pt{l=E}>{l: A} we 
T-PROJ T-TABS 
A; TFKFE={l: Alive A,axA;sTFE>Bwe AFA AFT 
ATF Els A~e A; Tr F A(ax A). E > V(ax* A). B ~ Aa. e 
T-TAPP 


A;TFE=>V(axB).Cwe AFAxB 
4A; T F EAs [A/a]C ~ e|A| 


AT TFE Awe (Checking) 

T-ABS T-SUB 

AFA A;I\t:AFEGHBwe 4T- E = BEE B <: AS co 
A; T F àz. E < A— B~ z.e A;TFE<=A~coe 


Fig. 5. Bidirectional type system 


Typing rules. FF features a bidirectional type system inherited from F;. The 
full set of typing rules are shown in Fig.5. Again we ignore the gray parts 
and explain them in Sect.3.3. The inference judgment A; l F E = A says 
that we can synthesize the type A under the contexts A and I’. The checking 
judgment A; H E < A asserts that E checks against the type A under 
the contexts A and I’. Most of the rules are quite standard in the literature. 
The merge expression F,,, E2 is well-typed if both sub-expressions are well- 
typed, and their types are disjoint (rule T-MERGE). The disjointness relation 
will be explained in Sect.3.2. To infer a type abstraction (rule T-TABS), we 
add disjointness constraints to the type context. For a type application (rule 
T-TAPP), we check that the type argument satisfies the disjointness constraints. 
Rules T-MERGE and T-TAPP are the only rules checking disjointness. 
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JAJ (Top-like types) 
TL TL-AND TL-ARR TL-RCD TL-ALL 
Ae 14. IBI 1B 14T 1BI 
TTI JA& Bl JA— BI Hl: AT |V(a@* A). BI 

AF AxB (Disjointness) 

D-TOPL D-TOPR D-ARR 
AF AxB AFAxB At A, — Ao x By — Bo 
D-ANDL D-ANDR 
At Ai & Ao* B AF Ax Bi & Bo 
D-RCDEQ D-RCDNEQ D-TVARL 
AF AxB hL Æl (axA)EA A<: B 
AF {l: A}*{l: B} At {lh : A}*{b: B} AFaxB 
D-TVARR D-FORALL D-Ax 
(ax A)EA A<:B A,a« Ai & A2 F Bi * Bo Ax*arz B 
AF Bea AF YV(a* Ai). Bi * V(a* A2). Bo A-FAxB 


Fig. 6. Selected rules for disjointness 


3.2 Disjointness 


We now turn to another core judgment of F}—the disjointness relation, shown 
in Fig.6. The disjointness rules are mostly inherited from F; [2], but the new 
bottom type requires a notable change regarding disjointness with top-like types. 


Top-like types. Top-like types are all types that are isomorphic to T (i.e., simulta- 
neously sub- and supertypes of T). Hence, they are inhabited by a single value, 
isomorphic to the T value. Figure6 captures this notion in a syntax-directed 
fashion in the ]A[ predicate. As a historical note, the concept of top-like types 
was already known by Barendregt et al. [3]. The A; calculus [36] re-discovered it 
and coined the term “top-like types”; the F; calculus [2] extended it with univer- 
sal quantifiers. Note that in both calculi, top-like types are solely employed for 
enabling a syntactic method of proving coherence, and due to the lack of BCD 
subtyping, they do not have a type-theoretic interpretation of top-like types. 


Disjointness rules. The disjointness judgment A F A * B is helpful to check 
whether the merge of two expressions of type A and B preserves coherence. 
Incoherence arises when both expressions produce distinct values for the same 
type, either directly when they are both of that same type, or through implicit 
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Types T = Int | O | m1 > T2 | 71 x Tm |a | Var 

Terms e z= a2 | i| O | Av.e| e1 e2 | (e1, e2) | Aa.e | er | coe 

Coercions co = id | coy o cog | top | bot | coy — cog | (co1, co2) | mi | T2 
| coy | dist. | top_, | topy | disty 

Values v =i | 0 |àz.e | (v1, v2) | Aa. e | (cor > coz) v | coy v 
| dist v | top_, v | topy v | disty v 

Term contexts U = 0| Y, r:T 

Type contexts @ ::= e |S, a 


Evaluation contexts € :: 


[]|E€e| vé| (E,e) | (v,€) | co€ | Er 


Fig. 7. Syntax of Feo 


upcasting to a common supertype. Of course we can safely disregard top-like 
types in this matter because they do not have two distinct values. In short, it 
suffices to check that the two types have only top-like supertypes in common. 

Because | and any another type A always have A as a common supertype, 
it follows that L is only disjoint to A when A is top-like. More generally, if A is 
a top-like type, then A is disjoint to any type. This is the rationale behind the 
two rules D-TOPL and D-ToPR, which generalize and subsume AF T * A and 
At Ax T from F;, and also cater to the bottom type. Two other interesting 
rules are D-TVARL and D-TVARR, which dictate that a type variable a is disjoint 
with some type B if its disjointness constraints A is a subtype of B. Disjointness 
axioms A*,,B (appearing in rule D-AX) take care of two types with different type 
constructors (e.g., Int and records). Axiom rules can be found in the appendix. 
Finally we note that the disjointness relation is symmetric. 


3.3 Elaboration and Type Safety 


The dynamic semantics of EE is given by elaboration into a target calculus. 
The target calculus Feo is the standard call-by-value System F extended with 
products and coercions. The syntax of Feo is shown in Fig. 7. 


Type translation. Definition 1 defines the type translation function |- | from 
F7 types A to Feo types T. Most cases are straightforward. For example, L 
is mapped to an uninhabited type Va.a; disjoint quantification is mapped to 
universal quantification, dropping the disjointness constraints. |- | is naturally 
extended to work on contexts as well. 


Definition 1. Type translation |-| is defined as follows: 


|Int| = Int |T| = () |A + B| = |A| > |B| 
|A & B| = |A] x |B| K: A}| =|A| lal =a 
|L| = Va. a \V(a * A). B| = Ya. |B| 
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e — e (Single-step reduction) 
R-FORALL R-TOPALL R-DISTALL 
(coy v) T — co(vT) (topy ()) 7 — () (disty (v1, v2)) T — (v1 T, v2 T) 
R-CTXT 
R-TAPP R-APP e= yë 
(Aa. e)r — [r/al]e (Az. e)v — [v/z]e Ele] — E[e’] 


Fig. 8. Selected reduction rules 


Coercions and coercive subtyping. We follow prior work [5,6] by having a syntac- 
tic category for coercions [22]. In Fig. 7, we have several new coercions: bot, coy, 
disty and topy due to the addition of polymorphism and bottom type. As seen 
in Fig. 4 the coercive subtyping judgment has the form A <: B ~~ co, which says 
that the subtyping derivation for A <: B produces a coercion co that converts 
terms of type |A] to |B]. 


Fco static semantics. The typing rules of Feo are quite standard. We have one 
rule T-CAPP regarding coercion application, which uses the judgment co::7 > 7’ 
to type coercions. We show two representative rules CT-FORALL and CT-BOT. 


T-CAPP CT-FORALL 


@Ure:t con TD’ CO: T1 D T2 ORRON 


WF coe: T coy :: Va. Ti D Va. T2 bot :: Va.a > T 


Fco dynamic semantics. The dynamic semantics of Feo is mostly unremarkable. 
We write e — e’ to mean one-step reduction. Figure 8 shows selected reduction 
rules. The first line shows three representative rules regarding coercion reduc- 
tions. They do not contribute to computation but merely rearrange coercions. 
Our coercion reduction rules are quite standard but not efficient in terms of 
space. Nevertheless, there is existing work on space-efficient coercions [23,50], 
which should be applicable to our work as well. Rule R-APP is the usual -rule 
that performs actual computation, and rule R-CTXT handles reduction under an 
evaluation context. As usual, —>* is the reflexive, transitive closure of —. Now 
we can show that Feo is type safe: 


Theorem 1 (Preservation). [fe;et e:7 and e — e’, thene;et e:r. 


Theorem 2 (Progress). Ife;et e:r, either e is a value, or Je’. e — e’. 


Elaboration. Now consider the translation parts in Fig. 5. The key idea of the 
translation follows the prior work [2,5, 16,36]: merges are elaborated to pairs (rule 
T-MERGE); disjoint quantification and disjoint type applications (rules T-TABS 
and T-TAPP)) are elaborated to regular universal quantification and type appli- 
cations, respectively. Finally, the following lemma connects F+ to Feo: 
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Lemma 2 (Elaboration soundness). We have that: 
- If A <: B ~ co, then co :: |A| > |B. 
- f Ar F E= Axe, then |A|; |T] 
- f ATr F E< Awe, then |A|; |T] 


Fe: |A]. 
Fe: |Al. 
4 Algorithmic System and Decidability 


The subtyping relation in Fig. 4 is highly non-algorithmic due to the presence of 
a transitivity rule. This section presents an alternative algorithmic formulation. 
Our algorithm extends that of AR which itself was inspired by Pierce’s decision 
procedure [38], to handle disjoint quantifiers and the bottom type. We then prove 
that the algorithm is sound and complete with respect to declarative subtyping. 
Additionally we prove that the subtyping and disjointness relations are decid- 
able. Although the proofs of this fact are fairly straightforward, it is nonetheless 
remarkable since it contrasts with the subtyping relation for (full) F<: [10], which 
is undecidable [40]. Thus while bounded quantification is infamous for its unde- 
cidability, disjoint quantification has the nicer property of being decidable. 


4.1 Algorithmic Subtyping Rules 


While Fig. 4 is a fine specification of how subtyping should behave, it cannot be 
read directly as a subtyping algorithm for two reasons: (1) the conclusions of 
rules S-REFL and S-TRANS overlap with the other rules, and (2) the premises 
of rule S-TRANS mention a type that does not appear in the conclusion. Simply 
dropping the two offending rules from the system is not possible without losing 
expressivity [29]. Thus we need a different approach. Following A7, we intend 
the algorithmic judgment Q F A <: B to be equivalent to A <: Q > B, where Q 
is a queue used to track record labels, domain types and disjointness constraints. 
The full rules of the algorithmic subtyping of F} are shown Fig. 9. 


Definition 2 (Q := |] | /,Q| B,Q|a*B,Q). Q = A is defined as follows: 


[=> A=A (B,Q)=A=B—>(Q5 A) 
(,Q)>A={l:Q=> A} (ax B,Q) = A =Y(ax B).Q >A 


For brevity of the algorithm, we use metavariable c to mean type constants: 
Type Constants c:=Int| 1 |a 


The basic idea of Q A <: B is to perform a case analysis on B until it reaches 
type constants. We explain new rules regarding disjoint quantification and the 
bottom type. When a quantifier is encountered in B, rule A-FORALL pushes the 
type variables with its disjointness constraints onto Q and continue with the 
body. Correspondingly, in rule A-ALLCONST, when a quantifier is encountered 
in A, and the head of Q is a type variable, this variable is popped out and we 
continue with the body. Rule A-BoT is similar to its declarative counterpart. 
Two meta-functions fay" and [a]* are meant to generate correct forms of 
coercions, and their definitions are shown in the appendix. For other algorithmic 
rules, we refer to A7 [5] for detailed explanations. 
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QOFA<:B~co (Algorithmic subtyping) 
A-AND 
A-TOP OFA <: Bi ~ co, OF A<: B2 ~ coz 
QF A<: T ~ [Q]' otop OFA<: Bı & By ~ [Q]* © (co1, coz) 
A-ARR A-RCD 
Q, Bı + A <: B2 ~ co Q,lF A <: B ~co 
QF A<: Bı > Bo ~ co OF A<:{l: B} ~ co 
A-FORALL A-CONST A-BOT 
O,a* By F A<: Bo ~> CO 
OFA <:V(ax* Bi). B2 ~> co [| F e<: c~ id QF L<: c~ bot 
A-ARRCONST A-RCDCONST 
-A <: Ar ~ cor QF A2 <: c ~ coz QFA<: dco 
A, QF A; > Ag <: c ~ c01 > c02 l,QF {l: A} <:c~ co 
A-ANDCONST A-ALLCONST 
QF Aj <: c~ co i € {1,2} [F A <: A QF Ap <: c ~ co 
Q F Ay & Ao <: C ~ COO Ti (a x A, Q) F Y(a x Ai). A2 <: € ~ coy 


Fig. 9. Algorithmic subtyping 


Correctness of the algorithm. We prove that the algorithm is sound and complete 
with respect to the specification. We refer the reader to our Coq formalization 
for more details. We only show the two major theorems: 


Theorem 3 (Soundness). If QF A <: B ~ co then A <: Q => B ~ co. 


Theorem 4 (Completeness). If A <: B ~ co, then dco’. || F A <: B ~ co’. 


4.2 Decidability 


Moreover, we prove that our algorithmic type system is decidable. To see this, 
first notice that the bidirectional type system is syntax-directed, so we only need 
to show decidability of algorithmic subtyping and disjointness. The full (manual) 
proofs for decidability can be found in the appendix. 


Lemma 3 (Decidability of algorithmic subtyping). Given Q, A and B, it 
is decidable whether there exists co, such that QF A <: B ~ co. 


Lemma 4 (Decidability of disjointness checking). Given A, A and B, it 
is decidable whether A F Ax B. 


One interesting observation here is that although our disjointness quantifi- 
cation has a similar shape to bounded quantification V(a <: A). B in F<; [10], 
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subtyping for Fe, is undecidable [40]. In Fe,, the subtyping relation between 
bounded quantification is: 
At Ag <: Ai A,a <: A2 F By <: B2 


FSUB-FORALL 
AF V(a <I A). By <: V(a <i A2). Bo 


Compared with rule S-FORALL, both rules are contravariant on bounded/dis- 
joint types, and covariant on the body. However, with bounded quantification it 
is fundamental to track the bounds in the environment, which complicates the 
design of the rules and makes subtyping undecidable with rule FSUB-FORALL. 
Decidability can be recovered by employing an invariant rule for bounded quan- 
tification (that is by forcing A, and A, to be identical). Disjoint quantification 
does not require such invariant rule for decidability. 


5 Establishing Coherence for Ft 


In this section, we establish the coherence property for F7. The proof strat- 
egy mostly follows that of A7, but the construction of the heterogeneous logical 
relation is significantly more complicated. Firstly in Sect.5.1 we discuss why 
adding BCD subtyping to disjoint polymorphism introduces significant compli- 
cations. In Sect.5.2, we discuss why a natural extension of System F’s logical 
relation to deal with disjoint polymorphism fails. The technical difficulty is well- 
foundedness, stemming from the interaction between impredicativity and dis- 
jointness. Finally in Sect. 5.3, we present our (predicative) logical relation that 
is specially crafted to prove coherence for F7. 


5.1 The Challenge 


Before we tackle the coherence of FF, let us first consider how F; (and its prede- 
cessor À;) enforces coherence. Its essentially syntactic approach is to make sure 
that there is at most one subtyping derivation for any two types. As an immedi- 
ate consequence, the produced coercions are uniquely determined and thus the 
calculus is clearly coherent. Key to this approach is the invariant that the type 
system only produces disjoint intersection types. As we mentioned in Sect. 3, 
this invariant complicates the calculus and its metatheory, and leads to a weaker 
substitution lemma. Moreover, the syntactic coherence approach is incompat- 
ible with BCD subtyping, which leads to multiple subtyping derivations with 
different coercions and requires a more general substitution lemma. To accom- 
modate BCD into ;, Bi et al. [5] have created the Af calculus and developed a 
semantically-founded proof method based on logical relations. Because Af does 
not feature polymorphism, the problem at hand is to incorporate support for 
polymorphism in this semantic approach to coherence, which turns out to be 
more challenging than is apparent. 
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(v1, v2) € Vint; Int] £ Ji. v = v = i 
(v, v2) € VIn > m Ti > 73] £ Yw, v^) € VIn; Tti]. (v v, w v’) € Ela; ri] 
(lu, v2), v3) € VIT X T2; T3] £ (v1, v3) € VIr; 73] A (v2, vs) € VIr2; 73] 
(v3, (v1, v2)) € VIT3; T1 X T2] £ (vs, v1) € Virs; 1] A (w3, v2) € Virs; T2] 


Fig. 10. Selected cases from Af ’s canonicity relation 


5.2 Impredicativity and Disjointness at Odds 


Figure 10 shows selected cases of canonicity, which is AP's (heterogeneous) logical 
relation used in the coherence proof. The definition captures that two values vı 
and vp of types Tı and 72 are in V[n1; T2] iff either the types are disjoint or 
the types are equal and the values are semantically equivalent. Because both 
alternatives entail coherence, canonicity is key to AP's coherence proof. 


Well-foundedness issues. For F}, we need to extend canonicity with additional 
cases to account for universally quantified types. For reasons that will become 
clear in Sect. 5.3, the type indices become source types (rather than target types 
as in Fig. 10). A naive formulation of one case rule is: 


(v1, v2) € VIV(a * A1). Bi; V(a * A2). Bo] £ 
VC. * Ai, Cy * Ao. (v1 [Ci], V2 | C2) = E[[Ci/a] Bi; [C2 /a] Bo] 


This case is problematic because it destroys the well-foundedness of \;’s logical 
relation, which is based on structural induction on the type indices. Indeed, the 
type [C1/a]Bı may well be larger than V(a x A). By. 

However, System F’s well-known parametricity logical relation [43] provides 
us with a means to avoid this problem. Rather than performing the type sub- 
stitution immediately as in the above rule, we can defer it to a later point by 
adding it to an extra parameter p of the relation, which accumulates the deferred 
substitutions. This yields a modified rule where the type indices in the recursive 
occurrences are indeed smaller: 


(v, v2) E€ VIV(a * A1). Bi; V(a* A2). Bo], £ 
VC. * Ai, C2 * Ao.(u | Ci], v2 |C2|) € EL B1; Be] plam (61,02) 


Of course, the deferred substitution has to be performed eventually, to be precise 
when the type indices are type variables. 


(v, v2) € VIa; a], = (v1, v2) € Vor (a); p2(a)]o 


Unfortunately, this way we have not only moved the type substitution to the 
type variable case, but also the ill-foundedness problem. Indeed, this problem is 
also present in System F. The standard solution is to not fix the relation R by 
which values at type a are related to V[p1(a@); p2(a)], but instead to make it a 
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parameter that is tracked by p. This yields the following two rules for disjoint 
quantification and type variables: 


(v, v2) € VIV(a * Ai). Bi;V(a * A2). B2ļp 4 VC. * Ai, Co * Ao, R Cc Cı x Co. 
(vı |Ci|, v2 |C2]) € EL Bi; Bo] pla (01, C2,R)] 
(vi, v2) € Vla; alp = (v1, v2) € pr(a) 


Now we have finally recovered the well-foundedness of the relation. It is again 
structurally inductive on the size of the type indexes. 


Heterogeneous issues. We have not yet accounted for one major difference 
between the parametricity relation, from which we have borrowed ideas, and 
the canonicity relation, to which we have been adding. The former is homoge- 
neous (i.e., the types of the two values is the same) and therefore has one type 
index, while the latter is heterogeneous (i.e., the two values may have different 
types) and therefore has two type indices. Thus we must also consider cases like 
Vla; Int]. A definition that seems to handle this case appropriately is: 


(v1, v2) € Vlas Int], Ê (v1, v2) € VEp1 (a); Int]o (1) 


Here is an example to motivate it. Let E = A(axT). ((Ax. x) : a& Int > a & Int). 
We expect that E Int 1 evaluates to (1,1). To prove that, we need to show (1,1) € 
Vilas Int] ja (int,int,R)}- According to Eq. (1), this is indeed the case. However, we 
run into ill-foundedness issue again, because p;(@) could be larger than a. Alas, 
this time the parametricity relation has no solution for us. 


5.3 The Canonicity Relation for F+ 


In light of the fact that substitution in the logical relation seems unavoidable 
in our setting, and that impredicativity is at odds with substitution, we turn to 
predicativity: we change rule T-TAPP to its predicative version: 
ATF E =VY(a*B).C~e AttxB 
A; T H Et => [t/a]C ~ ejt] 


T-TAPPMONO 


where metavariable ¢ ranges over monotypes (types minus disjoint quantifica- 
tion). We do not believe that predicativity is a severe restriction in practice, 
since many source languages (e.g., those based on the Hindley-Milner type sys- 
tem [24,32] like Haskell and OCaml) are themselves predicative and do not 
require the full generality of an impredicative core language. 

Luckily, substitution with monotypes does not prevent well-foundedness. 
Figure 11 defines the canonicity relation for F7}. The canonicity relation is a 
family of binary relations over Feo values that are heterogeneous, i.e., indexed 
by two F7 types. Two points are worth mentioning. (1) An apparent difference 
from At ’s logical relation is that our relation is now indexed by source types. The 
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(v1, v2) € V[[Int; Int 

(v, v2) E€ VHL: A}; {1 : B} 
(uv, v2) VIA1 > By; Ag > Bo 
((v1, v2), v3) E VIA& B; C 


qi. VY = v2 = i 
(v1, v2) E€ VIA; B] 
V(v3, vi) € VIA2; Ai]. (v1 vi, v2 v2) € Ef Bi; Bo] 
(vı, v3) E€ VIA; C] A (v2, v3) € VIB; C] 
(v3, (v1, v2)) E€ VIC; A & B] = (us, v1) € VIC; A] A (03, v2) € VIC; B] 
(v, v2) E VIV(a * Ai). Bi;V(a * A2). Bo Vel tx Ay & Ao. (ur Itl, V2 ||) € El [t/a] Bi; [t/a] Bo] 
(vı, v2) € VIA; B] = true otherwise 
(e, e2) € EJA; B 4 dun, v2. 6& —>* v A eg —>* v2 A (v1, v2) € VIA; B] 


> l> l> l> l> l> lo 


p € DĪA] et tx p(B) 
p € DIA] = 9 € Dfe] pla => t] € DIA, a * B] 


M Eegi] — (v, v) € VIe); e(AD] 
(11,72) € GIT], * (0,0) € Gle], (yfe = u], yele > w]) € GIT, x: Alp 


Fig. 11. The canonicity relation for FY 


reason is that the type translation function (Definition 1) discards disjointness 
constraints, which are crucial in our setting, whereas at ’s type translation does 
not have information loss. (2) Heterogeneity allows relating values of different 
types, and in particular values whose types are disjoint. The rationale behind 
the canonicity relation is to combine equality checking from traditional (homo- 
geneous) logical relations with disjointness checking. It consists of two relations: 
the value relation V[A; B] relates closed values; and the expression relation 
€|A; B]—defined in terms of the value relation—relates closed expressions. 
The relation V[A; B] is defined by induction on the structures of A and B. 
For integers, it requires the two values to be literally the same. For two records to 
behave the same, their fields must behave the same. For two functions to behave 
the same, they are required to produce outputs related at Bı and Bə when given 
related inputs at A; and Ag. For the next two cases regarding intersection types, 
the relation distributes over intersection constructor & . Of particular interest is 
the case for disjoint quantification. Notice that it does not quantify over arbitrary 
relations, but directly substitutes œ with monotype t in Bı and B2. This means 
that our canonicity relation does not entail parametricity. However, it suffices 
for our purposes to prove coherence. Another noticeable thing is that we keep 
the invariant that A and B are closed types throughout the relation, so we no 
longer need to consider type variables. This simplifies things a lot. Note that 
when one type is L, two values are vacuously related because there simply are 
no values of type L. We need to show that the relation is indeed well-founded: 


Lemma 5 (Well-foundedness). The canonicity relation of F} is well- 
founded. 


Proof. Let |-|y and |-|, be the number of V-quantifies and the size of types, respec- 
tively. Consider the measure (|- |y, |- |s}, where (...) denotes lexicographic order. 
For the case of disjoint quantification, the number of V-quantifiers decreases. For 
the other cases, the measure of | - |y does not increase, and the measure of | - |s 
strictly decreases. 
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5.4 Establishing Coherence 


Logical equivalence. The canonicity relation can be lifted to open expressions in 
the standard way, i.e., by considering all possible interpretations of free type and 
term variables. The logical interpretations of type and term contexts are found 
in the bottom half of Fig. 11. 


Definition 3 (Logical equivalence 19g) 


4; r E e “iog €2: A; B £ JAG; |D F e : JAJAJA; EIF e2 : |B| A 
(Yp, 71,72. p E DIAJ A (71,72) € GIT] => (1 (p1 (e1)), ¥2(02(e2))) € Ele); e(B)]) 


For conciseness, we write A; I’ F e1 Siog €2 : A to mean A; I F e1 Slog e2 : A; A. 
Contextual equivalence. Following NT the notion of coherence is based on contes- 
tual equivalence. The intuition is that two programs are equivalent if we cannot 
tell them apart in any context. As usual, contextual equivalence is expressed 
using expression contexts (C and D denote Er and Fco expression contexts, 
respectively), Due to the bidirectional nature of the type system, the typing 
judgment of C features 4 different forms (full rules are in the appendix), e.g., 
C: (4r > A)» (ASI” > A’) ~ Dreads if A;r + E => A then 
A; I” + C{E} = A’. The judgment also generates a well-typed Feo context 
D. The following two definitions capture the notion of contextual equivalence: 


Definition 4 (Kleene Equality <). Two complete programs (i.e., closed 
terms of type Int), e and e’, are Kleene equal, written e = e', iff there exists an 
integer i such that e —>* i and e' —* i. 


Definition 5 (Contextual Equivalence ~,:,) 


A; I Et Sets Eo: A Ê Ver, e2. ATFh=sAmnaAADIF+hbsAmnan 
(VC,D.C:(A;P => A) (e;¢ = Int) ~ D => D{ei} = D{eo}) 


Coherence. For space reasons, we directly show the coherence statement of F. 
We need several technical lemmas such as compatibility lemmas, fundamental 
property, etc. The interested reader can refer to our Coq formalization. 


Theorem 5 (Coherence). We have that 


- If Ar FE => Athen ALIFE Sag BE: A. 
-IfA;PFE <& Athen ALFE Sag BE: A. 


That is, coherence is a special case of Definition 5 where F; and Es are the same. 
At first glance, this appears underwhelming: of course Æ behaves the same as 
itself! The tricky part is that, if we expand it according to Definition 5, it is not 
E itself but all its translations eı and ez that behave the same! 
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6 Related Work 


Coherence. In calculi featuring coercive subtyping, a semantics that interprets 
the subtyping judgment by introducing explicit coercions is typically defined on 
typing derivations rather than on typing judgments. A natural question that 
arises for such systems is whether the semantics is coherent, i.e., distinct typ- 
ing derivations of the same typing judgment possess the same meaning. Since 
Reynolds [45] proved the coherence of a calculus with intersection types, many 
researchers have studied the problem of coherence in a variety of typed calculi. 
Two approaches are commonly found in the literature. The first approach is to 
find a normal form for a representation of the derivation and show that normal 
forms are unique for a given typing judgment [8, 15,47]. However, this approach 
cannot be directly applied to Curry-style calculi (where the lambda abstractions 
are not type annotated). Biernacki and Polesiuk [6] considered the coherence 
problem of coercion semantics. Their criterion for coherence of the translation 
is contextual equivalence in the target calculus. Inspired by this approach, Bi 
et al. [5] proposed the canonicity relation to prove coherence for a calculus with 
disjoint intersection types and BCD subtyping. As we have shown in Sect. 5, 
constructing a suitable logical relation for FF is challenging. On the one hand, 
the original approach by Alpuim et al. [2] in F; does not work any more due to 
the addition of BCD subtyping. On the other hand, simply combining System 
F’s logical relation with AE ’s canonicity relation does not work as expected, due 
to the issue of well-foundedness. To solve the problem, we employ immediate 
substitutions and a restriction to predicative instantiations. 


BCD subtyping and decidability. The BCD type system was first introduced 
by Barendregt et al. [3] to characterize exactly the strongly normalizing terms. 
The BCD type system features a powerful subtyping relation, which serves as 
a base for our subtyping relation. The decidability of BCD subtyping has been 
shown in several works [27,38,41,51]. Laurent [28] formalized the relation in 
Coq in order to eliminate transitivity cuts from it, but his formalization does 
not deliver an algorithm. Only recently, Laurent [30] presented a general way 
of defining a BCD-like subtyping relation extended with generic contravariant/- 
covariant type constructors that enjoys the “sub-formula property”. Our Coq 
formalization extends the approach used in AT which follows a different idea 
based on Pierce’s decision procedure [38], with parametric (disjoint) polymor- 
phism and corresponding distributivity rules. More recently, Muehlboeck and 
Tate [34] presented a decidable algorithmic system (proved in Coq) with union 
and intersection types. Similar to EF, their system also has distributive subtyping 
rules. They also discussed the addition of polymorphism, but left a Coq formal- 
ization for future work. In their work they regard intersections of disjoint types 
(e.g., String & Int) as uninhabitable, which is different from our interpretation. 
As a consequence, coherence is a non-issue for them. 


Intersection types, the merge operator and polymorphism. Forsythe [44] has inter- 
section types and a merge-like operator. However to ensure coherence, various 
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Fig. 12. Summary of intersection calculi (e = yes, o = no, ọ = syntactic coherence) 


restrictions were added to limit the use of merges. In Forsythe merges cannot 
contain more than one function. Castagna et al. [12] proposed a coherent cal- 
culus A& to study overloaded functions. A& has a special merge operator that 
works on functions only. Dunfield proposed a calculus [16] (which we call A,,) 
that shows significant expressiveness of type systems with unrestricted intersec- 
tion types and an (unrestricted) merge operator. However, because of his unre- 
stricted merge operator (allowing 1,, 2), his calculus lacks coherence. Blaauw- 
broek’s AX [7] enriched à, with BCD subtyping and computational effects, but 
he did not address coherence. The coherence issue for a calculus similar to \_, 
was first addressed in à; [36] with the notion of disjointness, but at the cost 
of dropping unrestricted intersections, and a strict notion of coherence (based 
on a-equivalence). Later Bi et al. [5] improved calculi with disjoint intersection 
types by removing several restrictions, adopted BCD subtyping and a semantic 
notion of coherence (based on contextual equivalence) proved using canonicity. 
The combination of intersection types, a merge operator and parametric poly- 
morphism, while achieving coherence was first studied in F; [2], which serves as 
a foundation for FF. However, F; suffered the same problems as \;. Additionally 
in F; a bottom type is problematic due to interactions with disjoint polymor- 
phism and the lack of unrestricted intersections. The issues can be illustrated 
with the well-typed F7 expression A(a* L). Az : a.x ,, z. In this expression the 
type of «,, z is a&a. Such a merge does not violate disjointness because the 
only types that a can be instantiated with are top-like, and top-like types do not 
introduce incoherence. In F; a type variable a can never be disjoint to another 
type that contains a, but (as the previous expression shows) the addition of a 
bottom type allows expressions where such (strict) condition does not hold. In 
this work, we removed those restrictions, extended BCD subtyping with poly- 
morphism, and proposed a more powerful logical relation for proving coherence. 
Figure 12 summarizes the main differences between the aforementioned calculi. 

There are also several other calculi with intersections and polymorphism. 
Pierce proposed F, [39], a calculus combining intersection types and bounded 
quantification. Pierce translates Fa to System F extended with products, but 
he left coherence as a conjecture. More recently, Castagna et al. [14] proposed a 
polymorphic calculus with set-theoretic type connectives (intersections, unions, 
negations). But their calculus does not include a merge operator. Castagna and 
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Lanvin also proposed a gradual type system [13] with intersection and union 
types, but also without a merge operator. 


Row polymorphism and bounded polymorphism. Row polymorphism was origi- 
nally proposed by Wand [54] as a mechanism to enable type inference for a sim- 
ple object-oriented language based on recursive records. These ideas were later 
adopted into type systems for extensible records [19,21,31]. Our merge operator 
can be seen as a generalization of record extension/concatenation, and selection 
is also built-in. In contrast to most record calculi, restriction is not a primitive 
operation in F7, but can be simulated via subtyping. Disjoint quantification can 
simulate the lacks predicate often present in systems with row polymorphism. 
Recently Morris and McKinna presented a typed language [33], generalizing 
and abstracting existing systems of row types and row polymorphism. Alpuim 
et al. [2] informally studied the relationship between row polymorphism and dis- 
joint polymorphism, but it would be interesting to study such relationship more 
formally. The work of Morris and McKinna may be interesting for such study in 
that it gives a general framework for row type systems. 

Bounded quantification is currently the dominant mechanism in major main- 
stream object-oriented languages supporting both subtyping and polymorphism. 
F<: [10] provides a simple model for bounded quantification, but type-checking 
in full F<; is proved to be undecidable [40]. Pierce’s thesis [39] discussed the rela- 
tionship between calculi with simple polymorphism and intersection types and 
bounded quantification. He observed that there is a way to “encode” many forms 
of bounded quantification in a system with intersections and pure (unbounded) 
second-order polymorphism. That encoding can be easily adapted to Er: 


Via <: A). B £ Y(a x T). (A & a/a] B) 


The idea is to replace bounded quantification by (unrestricted) universal quan- 
tification and all occurrences of aœ by A & a in the body. Such an encoding seems 
to indicate that F7 could be used as a decidable alternative to (full) Fe.. It 
is worthwhile to note that this encoding does not work in F; because A & a is 
not well-formed (a is not disjoint to A). In other words, the encoding requires 
unrestricted intersections. 


7 Conclusion and Future Work 


We have proposed ae a type-safe and coherent calculus with disjoint intersection 
types, BCD subtyping and parametric polymorphism. FF improves the state-of- 
art of compositional designs, and enables the development of highly modular and 
reusable programs. One interesting and useful further extension would be implicit 
polymorphism. For that we want to combine Dunfield and Krishnaswami’s app- 
roach [17] with our bidirectional type system. We would also like to study the 
parametricity of F}. As we have seen in Sect. 5.2, it is not at all obvious how to 
extend the standard logical relation of System F to account for disjointness, and 
avoid potential circularity due to impredicativity. A promising solution is to use 
step-indexed logical relations [1]. 
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Abstract. A cornerstone of the theory of A-calculus is that intersection 
types characterise termination properties. They are a flexible tool that 
can be adapted to various notions of termination, and that also induces 
adequate denotational models. 

Since the seminal work of de Carvalho in 2007, it is known that multi 
types (i.e. non-idempotent intersection types) refine intersection types 
with quantitative information and a strong connection to linear logic. 
Typically, type derivations provide bounds for evaluation lengths, and 
minimal type derivations provide exact bounds. 

De Carvalho studied call-by-name evaluation, and Kesner used his 
system to show the termination equivalence of call-by-need and call-by- 
name. De Carvalho’s system, however, cannot provide exact bounds on 
call-by-need evaluation lengths. 

In this paper we develop a new multi type system for call-by-need. Our 
system produces exact bounds and induces a denotational model of call- 
by-need, providing the first tight quantitative semantics of call-by-need. 


1 Introduction 


Duplications and erasures have always been considered as key phenomena in 
the A-calculus—the AJ-calculus, where erasures are forbidden, is an example of 
this. The advent of linear logic [38] gave them a new, prominent logical status. 
Forbidding erasure and duplication enables single-use resources, i.e. linearity, 
but limits expressivity, as every computation terminates in linear time. Their 
controlled reintroduction via the non-linear modality ! recovers the full expressive 
power of cut-elimination and allows a fine analysis of resource consumption. 
Duplication and erasure are therefore the key ingredients for logical expressivity, 
and—via Curry-Howard—for the expressivity of the A-calculus. They are also 
essential to understand evaluation strategies. 

In a A-term there can be many (-redexes, that is, places where (- 
reduction can be applied. In this sense, the A-calculus is non-deterministic. Non- 
determinism does not affect the result of evaluation, if any, but it affects whether 
evaluation terminates, and in how many steps. There are two natural determin- 
istic evaluation strategies, call-by-name (shortened to CbN) and call-by-value 
(CbV), which have dual behaviour with respect to duplication and erasure. 
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Call-by-Name = Silly Duplication + Wise Erasure. CbN never evaluates argu- 
ments of 3-redexes before the redexes themselves. As a consequence, it never 
evaluates in subterms that will be erased. This is wise, and makes CbN a nor- 
malising strategy, that is, a strategy that reaches a result whenever one exists!. 
A second consequence is that if the argument of the redex is duplicated then it 
may be evaluated more than once. This is silly, as it repeats work already done. 


Call-by-Value = Wise Duplication + Silly Erasure. CbV, on the other 
hand, always evaluates arguments of (-redexes before the redexes themselves. 
Consequently, arguments are not re-evaluated—this is wise with respect to 
duplication—but they are also evaluated when they are going to be erased. For 
instance, on t = (Ax. ày.y) 2, where 92 is the famous looping A-term, CbV evalu- 
ation diverges (it keeps evaluating 2) while CbN converges in one -step (simply 
erasing 2). This CbV treatment of erasure is clearly as silly as the duplicated 
work of CbN. 


Call-by-Need = Wise Duplication + Wise Erasure. It is natural to try to combine 
the advantages of both CbN and CbV. The strategy that is wise with respect 
to both duplications and erasures is usually called call-by-need (CbNeed), it was 
introduced by Wadsworth [57], and dates back to the ’70s. Despite being at the 
core of Haskell, one of the most-used functional programming languages, and— 
in its strong variant—being at work in the kernel of Coq as designed by Barras 
[16], the theory of CbNeed is much less developed than that of CbN or CbV. 

One of the reasons for this is that it cannot be defined inside the A-calculus 
without some hacking. Manageable presentations of CbNeed indeed require first- 
class sharing and micro-step operational semantics where variable occurrences 
are replaced one at a time (when needed), and not all at once as in the A-calculus. 
Another reason is the less natural logical interpretation. 


Linear Logic, Names, Values, and Needs. CbN and CbV have neat interpreta- 
tions in linear logic. They correspond to two different representations of intuition- 
istic logic in linear logic, based on two different representations of implication?. 

The logical interpretation of CbNeed—studied by Maraist et al. in [47|—is 
less neat than those of CbN and CbV. Within linear logic, CbNeed is usually 
understood as corresponding to the CbV representation where erasures are gen- 
eralised to all terms, not only those under the scope of a ! modality. So, it is seen 
as a sort of affine CbV. Such an interpretation however is unusual, because it 
does not match exactly with cut-elimination in linear logic, as for CbN and CbV. 


Call-by-Need, Abstractly. The main theorem of the theory of CbNeed is that it is 
termination equivalent to CbN, that is, on a fixed term, CbNeed evaluation ter- 
minates if and only if CbN evaluation terminates, and, moreover, they essentially 


1 If a term t admits both converging and diverging evaluation sequences then the 
diverging sequences occur in erasable subterms of t, which is why CbN avoids them. 

2 The CbN translation maps A > B to (!ACPN) — BON, while the CbV maps it to 
1ACPY —o IBV or equivalently to !(ACPY —o BOPV), 


412 B. Accattoli et al. 


produce the same result (up to some technical details that are irrelevant here). 
This is due to the fact that both strategies avoid silly divergent sequences such 
as that of (Av.Ay-y) 2. Termination equivalence is an abstract theorem stating 
that CbNeed erases as wisely as CbN. Curiously, in the literature there are no 
abstract theorems reflecting the dual fact that CbNeed duplicates as wisely as 
CbV—we provide one, as a side contribution of this paper. 


Call-by-Need and Denotational Semantics. CbNeed is then usually considered 
as a CbV optimisation of CbN. In particular, every denotational model of CbN 
is also a model of CbNeed, and adequacy—that is the fact that the denotation of 
t is not degenerated if and only if t terminates—transfers from CbN to CbNeed. 

Denotational semantics is invariant by evaluation, and so is insensitive 
to evaluation lengths by definition. It then seems that denotational seman- 
tics cannot distinguish between CbN and CbNeed. The aim of this paper is, 
somewhat counter-intuitively, to separate CbN and CbNeed semantically. We 
develop a type system whose type judgements induce a model—this is typ- 
ical of intersection type systems—and whose type derivations provide exact 
bounds for CbNeed evaluation—this is usually obtained via non-idempotent 
intersection types. Unsurprisingly, the design of the type system requires a del- 
icate mix of erasure and duplication and builds on the linear logic understand- 
ing of CbN and CbV. 


Multi Types. Our typing framework is given by multi types, which is an alterna- 
tive name for non-idempotent intersection types*. Multi types characterise termi- 
nation properties exactly as intersection types, having moreover the advantages 
that they are closely related to (the relational semantics of) linear logic, their 
type derivations provide quantitative information about evaluation lengths, and 
the proof techniques are simpler—no need for the reducibility method. 

The seminal work of de Carvalho [23] (appeared in 2007 but unpublished until 
2018, see also [22]) showed how to use multi types to obtain exact bounds on 
evaluation lengths in CbN. Ehrhard adapted multi types to CbV [34], and very 
recently Accattoli and Guerrieri adapted de Carvalho’s study of exact bounds to 
Ehrhard’s system and CbV evaluation [8]. Kesner used de Carvalho’s CbN multi 
types to obtain a simple proof that CbNeed is termination equivalent to CbN 
[40] (first proved with other techniques by Maraist, Odersky, and Wadler [48] 
and Ariola and Felleisen [11] in the nineties), and then Kesner and coauthors 
continued exploring the theory of CbNeed via CbN multi types [14, 15,42). 

Kesner’s use of CbN multi types to study CbNeed is qualitative, as it deals 
with termination and not with exact bounds. For a quantitative study of CbNeed, 
de Carvalho’s CbN system cannot really be informative: CbN multi types provide 
bounds for CbNeed which cannot be exact because they already provide exact 
bounds for CbN, which generally takes more steps than CbNeed. 


3 The new terminology is due to the fact that a non-idempotent intersection A ^ AA 
BAC can be seen as a multi-set [A, A, B, C]. 
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Multi Types by Need. In this paper we provide the first multi type system charac- 
terising CbNeed termination and whose minimal type derivations provide exact 
bounds for CbNeed evaluation lengths. The design of the type system is delicate, 
as we explain in Sect.6. One of the key points is that, in contrast to Ehrhard’s 
system for CbV [34], multi types for CbNeed cannot be directly extracted by 
the relational semantics of linear logic, given that CbNeed does not have a clean 
representation in it. A by-product of our work is a new denotational semantics 
of CbNeed, the first one to precisely reflect its quantitative properties. 

Beyond the result itself, the paper tries to stress how the key ingredients of 
our type system are taken from those for CbN and CbV and combined together. 
To this aim, we first present multi types for CbN and CbV, and only then we 
proceed to build the CbNeed system and prove its properties. 

Along the way, we also prove the missing fundamental property of CbNeed, 
that is, that it duplicates as efficiently as CbV. The result dualizes the termi- 
nation equivalence of CbN and CbNeed, which shows that CbNeed erases as 
wisely as CbN. Careful: the CbV system is correct but of course not complete 
with respect to CbNeed, because CbNeed may normalise when CbV diverges. 
The proof of the result is straightforward, because of our presentations of CbV 
and CbNeed. We adopt a liberal, non-deterministic formulation of CbV, and 
assuming (without loss of generality, see [1]) that garbage collection is always 
postponed. These two ingredients turn CbNeed into a fragment of CbV, obtain- 
ing the new fundamental result as a corollary of correctness of CbV multi types 
for CbV evaluation. 


Technical Development. The paper is extremely uniform, technically speaking. 
The three evaluations are presented as strategies of Accattoli and Kesner’s Linear 
Substitution Calculus (shortened to LSC) [1,6], a calculus with a simple but 
expressive form of explicit sharing. The LSC is strongly related to linear logic 
[2], and provides a neat and manageable presentation of CbNeed, introduced 
by Accattoli, Barenbaum, and Mazza in [3], and further developed by various 
authors in [4,5,10,14,15,40,42]. Our type systems count evaluation steps by 
annotating typing rules in the exact same way, and the proofs of correctness 
and completeness all follow the exact same structure. While the results for CbN 
are very minor variations with respect to those in the literature [7,23], those for 
CbV are the first ones with respect to a presentation of CbV with sharing. 

As it is standard for CbNeed, we restrict our study to closed terms and 
weak evaluation (that is, out of abstractions). The main consequence of this fact 
is that normal forms are particularly simple (sometimes called answers in the 
literature). Compared with other recent works dealing with exact bounds such 
as Accattoli, Graham-Lengrand, and Kesner [7] and Accattoli and Guerrieri [8] 
the main difference is that the size of normal forms is not taken into account by 
type derivations. This is because of the simple notions of normal forms in the 
closed and weak case, and not because the type systems are not accurate. 


Related Work About CbNeed. Call-by-need was introduced by Wadsworth [57] 
in the ’70s. In the ’90s, it was first reformulated as operational semantics by 
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Launchbury [46], Maraist, Odersky, and Wadler [48], and Ariola and Felleisen 
[11,12], and then implemented by Sestoft [55] and further studied by Kutzner 
and Schmidt-Schau8 [45]. More recent papers are Garcia, Lumsdaine, and Sabry 
[36], Ariola, Herbelin, and Saurin [13], Chang and Felleisen [26], Danvy and 
Zerny [29], Downen et al. [33], Pédrot and Saurin [53], and Balabonski et al. [14]. 


Related Work About Multi Types. Intersection types are a standard tool to study 
A-calculi—see Coppo and Dezani [27,28], Pottinger [54], and Krivine [44]. Non- 
idempotent intersection types, i.e. multi types, were first considered by Gardner 
[37], and then by Kfoury [43], Neergaard and Mairson [50], and de Carvalho 
[23|—a survey is Bucciarelli, Kesner, and Ventura [20]. 

Many recent works rely on multi types or relational semantics to study prop- 
erties of programs and proofs. Beyond the cited ones, Diaz-Caro, Manzonetto, 
and Pagani [32], Carraro and Guerrieri [21], Ehrhard and Guerrieri [35], and 
Guerrieri [39] deal with CbV, while Bernadet and Lengrand [17], de Carvalho, 
Pagani, and Tortora de Falco [24] provide exact bounds. Further related work is 
by Bucciarelli, Ehrhard, and Manzonetto [18], de Carvalho and Tortora de Falco 
[25], Tsukada and Ong [56], Kesner and Vial [41], Piccolo, Paolini and Ronchi 
Della Rocca [52], Ong [51], Mazza, Pellissier, and Vial [49], Bucciarelli, Kesner 
and Ronchi Della Rocca [19]—this list is not exhaustive. 


Proofs. Proofs are omitted. They can be found in the technical report [9]. 


2 Closed A-Calculi 


In this section we define the CbN, CbV, and CbNeed evaluation strategies. We 
present them in the context of the Accattoli and Kesner’s linear substitution cal- 
culus (LSC) [1,6]. We mainly follow the uniform presentation of these strategies 
given by Accattoli, Barenbaum, and Mazza [3]. The only difference is that we 
adopt a non-deterministic presentation of CbV, subsuming both the left-to-right 
and the right-to-left strategies in [3], that makes our results slightly more gen- 
eral. Such a non-determinism is harmless: not only CbV evaluation is confluent, 
it even has the diamond property, so that all evaluations have the same length. 
Moreover, the non-deterministic presentation, together with the postponement 
of erasing steps discussed below, allows us to see CbNeed as a fragment of CbV, 
which shall provide a free proof that CbNeed duplicates as wisely as CbV. 


Terms and Contexts. The set of terms Asc of the LSC is given by the grammar 
below, where t|x—s] is an explicit substitution (shortened to ES), that is a more 
compact notation for let x = s in t (intuitively, “t where x will be substituted 
by s”). Both Az.t and t|a—s] bind z in t, with the usual notion of a-equivalence. 


LSC TERMS t,s,u:= x |v |ts | tlr—s] LSC VALUES v:= Aa.t 


The set fv(t) of free variables of a term t is defined as expected, in particular, 
fv(t|z—s]) = (fv(t)\{x})Ufv(s). A term tis closed if fv(t) = 0, open otherwise. 
As usual, terms are identified up to a-equivalence. 
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Contexts are terms with exactly one occurrence of the hole (-), an additional 
constant. We shall use many different contexts. The most general ones are weak 
contexts W (i.e. not under abstractions). The (evaluation) contexts C, V and 
E—used to define CbN, CbV and CbNeed evaluation strategies, respectively— 
are special cases of weak contexts (in fact, CbV contexts coincide with weak 
contexts, the consequences of that are discussed on p. 8). To define evaluation 
strategies, substitution contexts (i.e. lists of explicit substitutions) also play a 
role. 


WEAK CONTEXTS (Y) | Wt | Wiet] | tW | tle W] 
== () | Sled] 
() 


| Ct | Cla ct] 


SUBSTITUTION CONTEXTS 


WwW 
S 
CBN CONTEXTS G i= 
CBV CONTEXTS V:: 
E 


CBNEED CONTEXTS 


We write W(t) for the term obtained by replacing the hole (-) in context 
W by the term t. This plugging operation, as usual with contexts, can capture 
variables—for instance (((-)t)[a—s])(x) = (xt)[r—s]. We write W {t} when we 
want to stress that the context W does not capture the free variables of t. 


Micro-step Semantics. The rewriting rules decompose the usual small-step 
semantics for A-calculi, by substituting linearly one variable occurrence at the 
time, and only when such an occurrence is in evaluation position. We empha- 
sise this fact saying that we adopt a micro-step semantics. We now give the 
definitions, examples of evaluation sequences follow right next. 

Formally, a micro-step semantics is defined by first giving its root-steps and 
then taking the closure of root-steps under suitable contexts. 


MULTIPLICATIVE ROOT-STEP S(AL.t) 8 >n S(t|x—s]) 

econ C(t) [et] 
EXPONENTIAL CBV ROOT-STEP V ((x))[a—S(v)] >en, S(V wje] 
S(E((v)[x—v}) 


EXPONENTIAL CBN ROOT-STEP C(x) [act] => 


EXPONENTIAL CBNEED ROOT-STEP E'((x))[a—S'(v)] — 


eneed 


where, in the root-step m (resp. Seas) Hena if S = [yis]... [yn sn] 
for some n € N, then fv(s) (resp. fv(V (x))); fv(E(x)))) and {y1,..--, Yn} are 
disjoint. This condition can always be fulfilled by a-equivalence. 

The evaluation strategies —.,, for CbN, —.,, for CbV, and — neeg for 
CbNeed, are defined as the closure of root-steps under CbN, CbV and CbNeed 
evaluation contexts, respectively (so, all evaluation strategies do not reduce 
under abstractions, since all such contexts are weak): 
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CbN CbV CbNeed 
neoa = C (>n) ney = V (>n) need = E (n) 
econ = C (Heena) sa a) Seneca T E(Heneea) 
opr = Cla ecn) cbv = V (nU esw) | need = E (nU Heneea) 


where the notation — := W (>) means that, given a root-step >, the evaluation 
— is defined as follows: t — if and only if there are terms t’ and s’ and a context 
W such that t = W(t’) and s = W(s’) and t's’. 

Note that evaluations —>.bn) —,py and —yeeg can equivalently be defined 
aS i, U Teon Oman egy and meea U eas Respectively. 

Given an evaluation sequence d: t 48 "we note “with |d| the length of d, 
and with |d|, and |d|. the number of multiplicative and exponential steps in d, 
respectively—and similarly for —.,,, and — 


=> 


need’ 


Erasing Steps. The reader may be surprised by our evaluation strategies, as none 
of them includes erasing steps, despite the absolute relevance of erasures pointed 
out in the introduction. There are no contradictions: in the LSC—in contrast to 
the A-calculus—erasing steps can always be postponed (see [1]), and so they are 
often simply omitted. This is actually close to programming language practice, 
as the garbage collector acts asynchronously with respect to the evaluation flow. 
For the sake of clarity let us spell out the erasing rules—they shall nonetheless 
be ignored in the rest of the paper. In CbN and CbNeed every term is erasable, 
so the root erasing step takes the following form 


t[r—s] —gc t if x ¢ fv(t) 


and it is then closed by weak evaluation contexts. 
In CbV only values are erasable; so, the root erasing step in CbV is: 


t[r—sS(v}] =g: S(t) if x ¢ fv(t) 
and it is then closed by weak evaluation contexts. 


Example 1. A good example to observe the differences between CbN, CbV, and 
CbNeed is given by the term t := ((Av.Ay.ax)(II))(I1) where I = Xz.z is 
the identity combinator. In CbN, it evaluates with 5 multiplicative steps and 5 
exponential steps, as follows: 


Fraga [e—II](11) 

eon (CD) [ly] eI] 
G-a) 

ecn tlw—a][2—T] yH] e] 

>n T [xI wr] zI] y] I] 


Mcbn 


Ay.LxL 


Ww 


ncn (£2) [YL] [e] 

tepn (Zz) [yH] e] 

noon Wwa] [z] [yH] [x1] 

econ ED) [w— a] [z] lye] 
Ix jwr] |z yI] 


— 
— 
— 

T ecbn 

— 


ecbn 


In CbV, t evaluates with 5 multiplicative steps and 5 exponential steps, for 
instance from right to left, as follows: 
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tes (Az.Ay.xx)(IT)(z[z—I]) =e (Aa.Ay.xx) (IT) (L[z—I]) 
nos (At-Ay.2x) (w[w—I]) He) copy (Av-AY-cx) I[w—I]) (I[z—I]) 
>ne Ay-2x) [e [w]e] nepy (T2) [yz] [rw] 
>e (eD) [yz I] [eI] [w—]] eey LL) [yz] [eT] [wT] 
Sae X e —T] [yl [z—T]] ew] von Tl Ily] ew] 


Note that the fact that CbN and CbV take the same number of steps is by 
chance, as they reduce different redexes: CbN never reduce the unneeded redex 
II associated to y, but it reduces twice the needed JI redex associated to zx, 
while CbV reduces both, but each one only once. 

In CbNeed, t evaluates in 4 multiplicative steps and 4 exponential steps. 


Ay.xx)|x—II|(IT) 
rzy] 
Iau- eje] 
wlw- 


(xx)ly—II] [xII] 
v2) [y—IT] oT le—I]] 


mea (W[W—a))[y—IT] [eT] [2-1] 
Iw- 


meed T meed 


enced ( 


( 
moea ( 
( 


enced 


= 
enced enced 


CbV Diamond Property. CbV contexts coincide with weak ones. As a conse- 
quence, our presentation of CbV is non-deterministic, as for instance one can 
have 


[e—T](yly—J)) mew — Dalu) Fea, ADE) 


but it is easily seen that diagrams can be closed in exactly one step (if the two 
reducts are different). For instance, 


a[e—I(yly—T]) >, leU) mw DE) 


Moreover, the kind of steps is preserved, as the example illustrates. This is an 
instance of the strong form of confluence called diamond property. A consequence 
is that either all evaluation sequences normalise or all diverge, and if they nor- 
malise they have all the same length and the same number of steps of each 
kind. Roughly, the diamond property is a form of relaxed determinism. In par- 
ticular, it makes sense to talk about the number of multiplicative/exponential 
steps to normal form, independently of the evaluation sequence. The proof of 
the property is an omitted routine check of diagrams. 


Normal Forms. We use two predicates to characterise normal forms, one for 
both CbN and CbNeed normal forms, for which ES can contain whatever term, 
and one for CbV normal forms, where ES can only contain normal terms: 


normal(t) normal.py(t) normal.py(s) 


normal(Az.t) normal(t[a—s]) normalepy (Ax.t) normal «py (t|xr—s]) 
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Proposition 1 (Syntactic characterization of closed normal forms). 
Let t be a closed term. 

1. CbN and CbNeed: For r € {cbn, need}, t is r-normal if and only if normal(t). 
2. CbV: t is cbv-normal if and only if normale» (t). 


The simple structure of normal forms is the main point where the restriction 
to closed calculi plays a role in this paper. 

From the syntactic characterization of normal forms (Proposition 1) it follows 
immediately that among closed terms, normal forms for CbN and CbNeed coin- 
cide, while normal forms for CbV are a subset of them. Such a subset is proper 
since the closed term I[x—d6] (where I := Az.z and 6 := Ay-yy) is normal for 
CbN and CbNeed but not for CbV (and it cannot normalise in CbV). 


3 Preliminaries About Multi Types 


In this section we define basic notions about multi types, type contexts, and 
(type) judgements that are shared by the three typing systems of the paper. 


Multi-sets. The type systems are based on two layers of types, defined in a 
mutually recursive way, linear types L and finite multi-sets M of linear types. 
The intuition is that a linear type L corresponds to a single use of a term, and 
that an argument t is typed with a multi-set M of n linear types if it is going 
to end up (at most) n times in evaluation position, with respect to the strategy 
associated with the type system. The three systems differ on the definition of 
linear types, that is therefore not specified here, while all adopt the same notion 
of finite multi-set M of linear types (named multi type), that we now introduce: 


MULTI TYPES M,N := [Lijicy (for any finite set J) 


where [...] denotes the multi-set constructor. The empty multi-set [] (the multi 
type obtained for J = Ø) is called empty (multi) type and denoted by the special 
symbol 0. An example of multi-set is [L, L, L’], that contains two occurrences of 
L and one occurrence of L’. Multi-set union is noted W. 


Type Contexts. A type context I is a (total) map from variables to multi types 
such that only finitely many variables are not mapped to 0. The domain of I is 
the set dom(I’) := {x | T(x) # 0}. The type context T is empty if dom( T`) = 0. 

Multi-set union W is extended to type contexts point-wise, i.e. (IW IT)(x) := 
I(x) w I(x) for each variable x. This notion is extended to a finite family 
of type contexts as expected, so that kejli denotes a finite union of type 
contexts—it stands for the empty context when J = Ø. A type context I’ is 
denoted by 71: M1,...,2n:Mn (for some n € N) if dom(I’) C {a1,...,2,} and 
I'(a;) = Mi for all 1 < i < n. Given two type contexts I and I such that 
dom(I") N dom( JT) = 0, the type context I, IJ is defined by (I, I7)(x) := I(x) if 
x € dom(I’), (T, H)(x) := W(x) if x € dom(I7), and (T, I7)(x) = 0 otherwise. 
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——_ ax —— normal 
x:[L] FOVe:L +9) \o.£: normal 
T,x: M Ht: L (IL Hoet: Lilies 
fun - - many 
DEO!) Azt: M — L Wies IL Hiem Diese): [Lilies 


DH®™®t:M ob MEO es: M app DEM HDL TEM es: M 


1 d li + ES 
Tw If Hte er ge g T w I HOH ete tiges]: L 


Fig. 1. Type system for CbN evaluation 


Judgements. Type judgements are of the form T E™%¢:L or FEL: M 
(noted also K™9+¢: L and bt: M, respectively, when I is the empty con- 
text), where the indices m and e are natural numbers whose intended meaning 
is that t evaluates to normal form in m multiplicative steps and e exponential 
steps, with respect to the evaluation strategy associated with the type system. 

To make clear in which type systems the judgement is derived, we write 
PPopn T’ H0O): Lif @ isa derivation in the CbN system ending in the judgement 
ret: L, and similarly for CbV and CbNeed. 


4 Types by Name 


In this section we introduce the CbN multi type system, together with intuitions 
about multi types. We also prove that derivations provide exact bounds on CbN 
evaluation sequences, and define the induced denotational model. 


CoN Types. The system is essentially a reformulation of de Carvalho’s system 
R [23], itself being a type-based presentation of the relational model of the CbN 
A-calculus induced by relational model of linear logic via the CbN translation of 
A-calculus into linear logic. Definitions: 

— CbN linear types are given by the following grammar: 


CBN LINEAR TYPES L, L’ = normal | M — L 


Multi(-sets) types are defined as in Sect.3, relatively to CbN linear types. 
Note the linear constant normal (used to type abstractions, which are normal 
terms): it plays a crucial role in our quantitative analysis of CbN evaluation. 

— The CDN typing rules are in Fig. 1. 

— The many rule: it has as many premises as the elements in the (possibly 
empty) set of indices J. When J = Ø, the rule has no premises, and it types 
t with the empty multi type 0. The many rule is needed to derive the right 
premises of the rules app and ES, that have a multi type M on their right- 
hand side. Essentially, it corresponds to the promotion rule of linear logic, 
that, in the CbN representation of the A-calculus, is indeed used for typing 
the right subterm of applications and the content of explicit substitutions. 
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— The size of a derivation S bepn I HOt: L is the sum m + e of the indices. 
A quick look to the typing rules shows that indices on typing judgements are 
not needed, as m can be recovered as the number of app rules, and e as the 
number of ax rules. It is however handy to note them explicitly. 


Subtleties and Easy Facts. Let us overview some facts about our presentation of 

the type system. 

1. Introduction and destruction of multi-sets: multi-set are introduced on the 
right by the many rule and on the left by ax. Moreover, on the left they are 
summed by app and ES. 

2. Vacuous abstractions: the abstraction rule fun can always abstract a variable 
x; note that if M = 0, then Ix: M is equal to I. 

3. Relevance: No weakening is allowed in axioms. An easy induction on type 
derivations shows that 


Lemma 1 (Type contexts and variable occurrences for CbN). Let ®> pn 
DE"): L be a derivation. If x g fv(t) then x ¢ don(L). 


Lemma 1 implies that derivations of closed terms have empty type context. Note 
that there can be free variables of t not in dom(J’): the ones only occurring in 
subterms not touched by the evaluation strategy. 


Key Ingredients. Two key points of the CbN system that play a role in the 

design of the CbNeed one in Sect. 6 are: 

1. Erasable terms and 0: the empty multi type 0 is the type of erasable terms. 
Indeed, abstractions that erase their argument—whose paradigmatic example 
is Ax-y—can only be typed with 0 — L, because of Lemma 1. Note that in 
CbN every term—even diverging ones—can be typed with O by rule many 
(taking 0 premises), because, correctly, in CbN every term can be erased. 

2. Adequacy and linear types: all CbN typing rules but many assign linear types. 
And many is used only as right premise of the rules app and ES, to derive M. 
It is with respect to linear types, in fact, that the adequacy of the system is 
going to be proved: a term is CbN normalising if and only if it is typable with 
a linear type, given by Theorems 1 and 2 below. 


Tight Derivations. A term may have several derivations, indexed by different 
pairs (m,e). They always provide upper bounds on CbN evaluation lengths. The 
interesting aspect of our type systems, however, is that there is a simple descrip- 
tion of a class of derivations that provide exact bounds for these quantities, as 
we shall show. Their definition relies on the normal type constant. 


Definition 1 (Tight derivations for CbN). A derivation PDcbn I Het 
is tight (for CoN) if L = normal and I is empty. 


Example 2. Let us return to the term t := ((Ax.Ay.xx)(II))(II) used in Exam- 
ple 1 for explaining the difference in reduction lengths among the different strate- 
gies. We now give a derivation for it in the CbN type system. 
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First, let us shorten normal to n. Then, we define @ as the following derivation 
for the subterm Ax.Ay.xx of t: 


—____—— ax 
g:(n] EOD z:n 
ax many 
x: [[n] — n] FO) x: [n] — n z: [n] HO z: [n] ae 
z:[n,[n] — n] HO?) zg :n : 
z: [n, [n] — n] HEP Ay.aa:0—0 n i 
fun 


HO?) \e.Ay.xa: [n, [n] — n] — (0 — n) 


Now, we need two derivations for II, one of type n, given by W as follows 


— ax — 
z: [n] OÐ z:n i 1.9) Aw.w:n o 
+1) Azz: [n] on YEO.) Aww: [n] 
app 
K&D IT:n 


and one of type [n] — n, given by £ as follows 


— ax 
w : [n] FOD win 
ax un 
z : [[n] — n] FO z: [n] — n HOD Aw.w : [n] — n 
fun many 
H©D Az.z: [in] — n] — ([n] — n) H©D Aw.w: [[n] — n] 


+2) IT: [n] — n 


Finally, we put , W and Æ together in the following derivation O for t = 
(s(II))(II), where s := àz. ày.xz and nl = [n] on 


y E 
P 1,1) ; 1,2 Frenli 
, ; KAD) II:n penis ml any 
HC?) 5: [n, nll] —o (0 —o n) H(23) TT: In, nll] 
app any 


—_———_ m 
H(45) s(TT):0 — n H-0) 77:0 
a 


pp 


H65) (s(IT))\(IT):n 


Note that O is a tight derivation and the indices (5,5) correspond to the number 
of mebn-steps and ecbpn-steps, respectively, from t to its cbn-normal form, as shown 
in Example 1. Theorem 1 below shows that this is not by chance: tight derivations 
for CbN are minimal and provide exact bounds to evaluation lengths in CbN. 


The next two subsections prove the two halves of the properties of the CbN 
type system, namely correctness and completeness. 
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4.1 CbN Correctness 


Correctness is the fact that every typable term is CbN normalising. In our setting 
it comes with additional quantitative information: the indices m and e of a 
derivation P pgp, CE’? t:L provide upper bounds on the length of the CbN 
evaluation of t, that are exact when the derivation is tight. 

The proof technique is standard. Moreover, the correctness theorems for CbV 
and CbNeed in the next sections follow exactly the same structure. The proof 
relies on a quantitative subject reduction property showing that m decreases 
by exactly one at each mepn-step, and similarly for e and e,pn-steps. In turn, 
subject reduction relies on a linear substitution lemma. Last, correctness for 
tight derivations requires a further property of normal forms. 

Let us point out that correctness is stated with respect to closed terms only, 
but the auxiliary results have to deal with open terms, since they are proved by 
inductions (over predicates defined by induction) over the structure of terms. 


Linear Substitution. The linear substitution lemma states that substituting over 
a variable occurrence as in the exponential rule consumes exactly one linear type 
and decreases of one the exponential index e. 


Lemma 2 (CbN linear substitution). If ® bon Dyc:M EC (a) :L 
then there is a splitting M = [L'] 8 N such that for every derivation W Pcbn 
I Ht: L there is a derivation D'bo T 8 H, x: N Em tmete VOYY: L. 


The proof is by induction over CbN evaluation contexts. 


Quantitative Subject Reduction. A key point of multi types is that the size of type 
derivations shrinks after every evaluation step, which is what allows to bound 
evaluation lengths. Remarkably, the size (defined as the sum of the indices) 
shrinks by exactly 1 at every evaluation step. 


Proposition 2 (Quantitative subject reduction for CbN). Let Pan 

PEt: L be a derivation. 

1. Multiplicative: if t >m, S then m > 1 and there exists a derivation W Pcpn 
PHO 5 T. 

2. Exponential: if t —, 
PHIS: L, 


cb 


aa then e > 1 and there exists a derivation W >cbn 


The proof is by induction on t —,, 5 and t >e 
tution lemma for the root exponential step. 


cpa © USing the linear substi- 


Tightness and Normal Forms. Since the indices are always non-negative, quan- 
titative subject reduction (Proposition 2) implies that they bound evaluation 
lengths. The bound is not necessarily exact, as derivations of normal forms can 
have strictly positive indices. If they are tight, however, they are indexed by 
(0,0), as we now show. The proof of this fact (by induction on the predicate 
normal) requires a slightly different statement, for the induction to go through. 
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Proposition 3 (normal typing of normal forms for CbN). Lett be such 
that normal(t), and ®Pcpn T Ht: normal be a derivation. Then I is empty, 
and so ® is tight, and m =e =Q. 


The Tight Correctness Theorem. The theorem is then proved by a straightfor- 
ward induction on the evaluation length relying on quantitative subject reduc- 
tion (Proposition 2) for the inductive case, and the properties of tight typings 
for normal forms (Proposition 3) for the base case. 
Theorem 1 (CbN tight correctness). Let t be a closed term. If B >cbn 
K+: then there is s such that d: t 8, with normal(s), |d| < M and 
|d|e < e. Moreover, if B is tight then |d|, =m and |d|, =e. 

Note that Theorem 1 implicitly states that tight derivations have minimal 
size among derivations. 


4.2 CbN Completeness 


Completeness is the fact that every CbN normalising term has a (tight) type 
derivation. As for correctness, the completeness theorem is always obtained via 
three intermediate steps, dual to those for correctness. 


Normal Forms. The first step is to prove (by induction on the predicate normal) 
that every normal form is typable, and is actually typable with a tight derivation. 
Proposition 4 (Normal forms are tightly typable for CbN). Let t be 
such that normal(t). Then there is tight derivation ® Pepn ++: normal. 


Linear Removal. In order to prove subject expansion, we have to first show 
that typability can also be pulled back along substitutions, via a linear removal 
lemma dual to the linear substitution lemma. 

Lemma 3 (Linear removal for CbN). Let >on T, x: M ECs): L, 
where x ¢ fv(s). Then there exist 

— a linear type L’ and two type contexts I’ and IT, 

— a derivation P' >ebn I” Rome). L', and 

— a derivation Y Pcpn H, x: M w [L] Eom) Ox) :L 

such that 

— Type contexts: l = I" w I. 

- Indices: (m,e) = (m + m”, e' + e” — 1). 


Quantitative Subject Expansion. This property is the dual of subject reduction. 


Proposition 5 (Quantitative subject expansion for CbN). Let ® >ebn 
THs: L be a derivation. 

1. Multiplicative: if t >n „5 then there is a derivation ¥ Pon T EomtLede. L, 
2. Exponential: ift >s 5 then there is a derivation VW Denn I EimetDe. L, 


The proof is by induction on t >n., 
lemma for the root exponential step. 


sand t —, , S, using the linear removal 


cb 
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The Tight Completeness Theorem. The theorem is proved by a straightforward 
induction on the evaluation length relying on quantitative subject expansion 
(Proposition 5) in the inductive case, and the existence of tight typings for 
normal forms (Proposition 4) in the base case. 


Theorem 2 (CbN tight completeness). Lett be a closed term. If d:t—%,,,8 


Mt: normal. 


and normal(s) then there is a tight derivation ® Pcbn (ldla,dle 
Back to Erasing Steps. Our system can be easily adapted to measure also garbage 
collection steps (the CbN erasing rule is just before Example 1). First, a new, 
third index g on judgements is necessary. Second, one needs to distinguish the 
erasing and non-erasing cases of the app and ES rules, discriminated by the 0 
type. For instance, the ES rules are (the app rules are similar): 


THD tL P(x) =0 Tz: MH™SDt L EO! 9 5: M M #0 


ES ec e ES 
T HSID Hec: L F T I HOH ete g+") tees]: L 


The right premise of rule ES,. has been removed because the only way to intro- 
duce 0 is via a many rule with no premises. The index g bounds to the number 
of erasing steps. In the closed case, however, the bound cannot be, in general, 
exact. Variables typed with 0 by I do not exactly match variables not appearing 
in the typed term (that is the condition triggering the erasing step), because a 
variable typed with 0 may appear in the body of abstractions typed with the 
normal rule, as such bodies are not typed. 

It is reasonable to assume that exact bounds for erasing steps can only by 
provided by a type system characterising strong evaluation, whose typing rules 
have to inspect abstraction bodies. These erasing typing rules are nonetheless 
going to play a role in the design of the CbNeed system in Sect. 6. 


4.3 CbN Model 


The idea to build the denotational model from the multi type system is that the 
interpretation (or semantics) of a term is simply the set of its type assignments, 
i.e. the set of its derivable types together with their type contexts. More precisely, 
let t be a term and z1,...,£n (with n > 0) be pairwise distinct variables. If 
fv(t) C {a1,...,¢n}, we say that the list = (x1,..., £n) is suitable for t. If 
T = (X1,...,Lp) is suitable for t, the (relational) semantics of t for Z is 


ISHN = {((Mi,..., Mn), L) | I8 bown £1: Mi,- , £n: My EOE: L}. 
Subject reduction (Proposition 2) and expansion (Proposition 5) guarantee 
that the semantics [t]$> of t (for any term t, possibly open) is invariant by CbN 
evaluation. Correctness (Theorem 1) and completeness (Theorem 2) guarantee 
that, given a closed term t, its interpretation [t]¢PN is non-empty if and only if 
t is CbN normalisable, that is, they imply that relational semantics is adequate. 
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fe DHD: [N — M] MHs: N 
— ax : - app 
xz:M HOVgr:M Dw H Hte Hete )ts: M 

D, x: N Ht: M (I; Heed) Awt: Ladies 
fun - - many 
DPE™®) Azt: N — M Wieg Mi HXrermi Dies) ygt: [Lilies 
T:N H™D:M MHs: N 


1 + ES 
KO T o H HOH ete ties]: M J 


Fig. 2. Type system for CbV evaluation. 


In fact, adequacy also holds with respect to open terms. The issue in that 
case is that the characterisation of tight derivations is more involved, see Accat- 
toli, Graham-Lengrand and Kesner’s [7]. Said differently, weaker correctness and 
completeness theorems without exact bounds also hold in the open case. The 
same is true for the CbV and CbNeed systems of the next sections. 


5 Types by Value 


Here we introduce Ehrhard’s CbV multi type system [34] adapted to our presen- 
tation of CbV in the LSC, and prove its properties. The system is similar, and 
yet in many aspects dual, to the CbN one, in particular the grammar of types 
is different. Linear types for CbV are defined by: 


CBV LINEAR TYPES L, L' ::= M — N 


Multi(-sets) types are defined as in Sect. 3, relatively to CbV linear types. Note 
that linear types now have a multi type both as source and as target, and that 
the normal constant is absent—in CbV, its role is played by 0. 

The typing rules are in Fig. 2. It is a type-based presentation of the relational 
model of the CbV -calculus induced by relational model of linear logic via the 
CbV translation of A-calculus into linear logic. Some remarks: 

— Right-hand types: all rules but fun assign a multi type to the term on the 
right-hand side, and not a linear type as in CbN. 

— Abstractions and many: the many rule has a restricted form with respect to 
the CbN one, it can only be applied to abstractions, that in turn are the only 
terms that can be typed with a linear type. 

— Indices: note as the indices are however incremented (on ax and app) and 
summed (in many and ES) exactly as in the CbN system. 


Intuitions: The Empty Type 0. The empty multi-set type O plays a special role 
in CbV. As in CbN, it is the type of terms that can be erased, but, in contrast 
to CbN, not every term is erasable in CbV. 
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In the CbN multi type system every term, even a diverging one, is typable 
with 0. On the one hand, this is correct, because in CbN every term can be 
erased, and erased terms can also be divergent, because they are never evaluated. 
On the other hand, adequacy is formulated with respect to non-empty types: a 
term terminates if and only if it is typable with a non-empty type. 

In CbV, instead, terms have to be evaluated before being erased; and, of 
course, their evaluation has to terminate. Thus, terminating terms and erasable 
terms coincide. Since the multi type system is meant to characterise terminating 
terms, in CbV a term is typable if and only if it is typable with 0, as we shall 
prove in this section. Then the empty type is not a degenerate type excluded for 
adequacy from the interesting types of a term, as in CDN, it rather is the type, 
characterising (adequate) typability altogether. And this is also the reason for 
the absence of the constant normal—one way to see it is that in CbV normal = 0. 

Note that, in particular, in a type judgement I'H t: M the type context T 
may give the empty type to a variable x occurring in ¢, as for instance in the 
axiom +:0 F x:0—this may seem very strange to people familiar with CbN 
multi types. We hope that instead, according to the provided intuition that 0 is 
the type of termination, it would rather seem natural. 


Definition 2 (Tight derivation for CbV). A derivation ®>epy FEO t: M 
is tight (for CoV) if M =0 and T is empty. 


Example 3. Let’s consider again the term t := ((Ax.Ay.xx)(II))(II) of Exam- 
ple 1 (where J := \z.z), for which a CDN tight derivation was given in Example 2, 
and let us type it in the CbV system with a tight derivation. 

We define the following derivation @, for the subterm s := Ax.Ay.xx of t 


ax ax 
z : [0 — 0] KO x: [0 — 0] r:0H©D 2:0 
app 
x: [0 — 0] HO) xx :0 
fun 
z : [0 — 0] HO) Ay.2a:0 — 0 
many 
x: [0 — 0] HE2 Ay.aa : [O — 0] , 
un 
HG2) s: [0 — 0] — [0 — 0] 
many 


C2) s: [[0 — 0] — [0 — O]] 


Note that [0 — 0] w 0 = [0 — 0], which explains the shape of the type context 
in the conclusion of the app rule. Next, we define the derivation ®2 as follows 


ax 


1E Penelo gi DEON n 
0,1 -10 —o 0] — T0 —o : f 

H AZ.z: [0 0) [0 0] man H0 1) Aw.w:0 — 0 many 
HO Az.z: [[0 — 0] — [0 — 0]] FO) Aw.w : [0 — 0] ip 


HG2) TT: [0 — 0] 


Types by Need 427 
and the derivation &3 as follows 


—________— ax 
x: 0 HOOD 2:0 
(0,1) Axl.: 0 0 fun 
VAD ee many ——_____ many 
HOLD) Agg: [0 — 0] +09) 7:0 
app 
HOD 77:0 


Finally, we put 1, 2 and P; together in the following derivation ® for t 


+2) §:[[0 — 0] — [0 — O]] +2) TT: [0 — 0] : 3 
app ` 
+4) (Ax.Ay.xe)(TI) : [0 — 0] Peet) 77:0 


H65) ((\x.Ay.2x) (IT)) (IT) :0 = 


Note that @ is a tight derivation and the indices (5,5) correspond to the number 
of mebv-steps and e-py-steps, respectively, from t to its cbv-normal form, as shown 
in Example 1. Theorem 3 below shows that this is not by chance: tight derivations 
for CbV are minimal and provide exact bounds to evaluation lengths in CbV. 


Correctness (i.e. typability implies normalisability) and completeness (i.e. 
normalisability implies typability) of the CbV type system with respect to CbV 
evaluation (together with quantitative information about evaluation lengths) 
follow exactly the same pattern of the CbN case, mutatis mutandis. 


5.1 CbV Correctness 


Lemma 4 (CbV linear substitution). Let bow T, x: M "OV (a): N 
and v be a value. There is a splitting M = O w P such that, 
for any derivation V Peay H Eom ey 0, there is a derivation ®' Deby 
DoH, g: PHOT + Dy (yy) oN, 


Proposition 6 (Quantitative subject reduction for CbV). Let ® Pay 
T H): M be a derivation. 
1. Multiplicative: if t >m „t then m > 1 and there exists a derivation © Deny 
THOTE: M. 
2. Exponential: if t >. „t then e > 1 and there exists a derivation P Depy 
THD: M, 


Proposition 7 (Tight typings for normal forms for CbV). Let ®Pcby 
re" )+:0 be a derivation, with normal.py(t). Then I is empty, and so ® is 
tight, andm=e=0. 


Theorem 3 (CbV tight correctness). Let t be a closed term. If ® oew 


T Ht: M then there is s such that d: t 8, with normalepy(s), |dln < m 
and |d|e < e. Moreover, if ® is tight then |d|, =m and |d|, = e. 
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5.2 CbV Completeness 


Proposition 8 (Normal forms are tightly typable for CbV). Let t be 
such that normalepy(t). Then there exists a tight derivation ® Pepy +009) 4.9, 


Lemma 5 (Linear removal for CbV). Let ®>ay T, £: M HOV 4o): N 
and v be a value, where x ¢ fv(v). Then, there exist 

— a multi type M’ and two type contexts I’ and IT, 

— a derivation P' Pepy I” Homey M’ and 

— a derivation Y Pepy H, £: M wv M' HDV py :N 

such that 

— Type contexts: l = I” IT, 

- Indices: (m, e) = (m + m”, e + e” — 1). 

Proposition 9 (Quantitative subject expansion for CbV). Let P >epv 
THY: M be a derivation. 

1. Multiplicative: ift >m „t then there is a derivation ® dewy T HTL). M, 
2. Exponential: if t >et then there is a derivation ®Pgpy T HetDE. M, 
Theorem 4 (CbV tight completeness). Lett be a closed term. If d:t >%,.s 


with normal.py(s), then there is a tight derivation ® Pcpy Hlldleldle) t. O, 


CbV Model. The interpretation of terms with respect to the CbV system is 
defined as follows (where % = (£1,..., £n) is a list of variables suitable for t): 


[gey = {((M,...,Mn),N) | d@eay z1: Mi,...,0n:Mn HDE: NY}. 


Note that rule fun assigns a linear type but the interpretation considers only 
multi types. The invariance and the adequacy of [t]Ș}VY with respect to CbV 
evaluation are obtained exactly as for the CbN case. 


6 Types by Need 


CbNeed as a Blend of CbN and CbV. The multi type system for CbNeed is 

obtained by carefully blending ingredients from the CbN and CbV ones: 

— Wise erasures from CbN: in CbN wise erasures are induced by the fact that 
the empty multi type 0 (the type of erasable terms) and the linear type normal 
(the type of normalisable terms) are distinct and every term is typable with 
0 by using the many rule with 0 premises. Adequacy is then formulated with 
respect to (non-empty) linear types. 

— Wise duplications from CbV: in CbV wise duplications are due to two 
aspects. First, only abstractions can be collected in multi-sets by rule many. 
This fact accounts for the evaluation of arguments to normal form—that is, 
abstractions—before being substituted. Second, terms are typed with multi 
types instead of linear types. Roughly, this second fact allows the first one to 
actually work because the argument is reduced once for a whole multi set of 
types, and not once for each element of the multi set, as in CbN. 
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a DHD [N — M] MHs: N b 
app 


ax : 7 
x:M Foe: M Pw Rortm therte tg: M 
. H (miei) aaie 
es many, (IL H Azx.t : Lijies J #0 many. 9 
H040 Wies TT, Hier Lies’) Agt: [Lilies 
Dx: N H™t: M Te: NHS: M MHO s: N 
fun 7 7 ES 
T HS Agt: N — M T w I HOH ete tire sl: M 


normal 
yee Ax.t : normal ah 


Fig. 3. Naive type system for CbNeed evaluation. 


It seems then that a type system for CbNeed can easily be obtained by basically 

adopting the CbV system plus 

— separating 0 and normal, that is, adding normal to the system; 

— modifying the many rule by distinguishing two cases: with 0 premises it can 
assign 0 to whatever term—as in CbN—otherwise it is forced to work on 
abstractions, as in CbV; 

— restricting adequacy to non-empty types. 

Therefore, the grammar of linear types is: 


CBNEED LINEAR TYPES L, L' := normal | M — N 


Multi(-sets) types are defined as in Sect.3, relatively to CbNeed linear types. 
The rules of this naive system for CbNeed are in Fig. 3. 


Issue with the Naive System. Unfortunately, the naive system does not work: 
tight derivations—defined as expected: empty type context and the term typed 
with [normal]—do not provide exact bounds. The problem is that the naive 
blend of ingredients allows derivations of 0 with strictly positive indices m and 
e. Instead, derivations of O0 should always have 0 in both indices—as is the 
case when they are derived with a many, rule with 0 premises—because they 
correspond to terms to be erased, that are not evaluated in CbNeed. For any 
term t, indeed, one can for instance derive the following derivation @: 


manyo 


FOD 520g 
+9) Axz.z:0 — 0 many„o ————— manyo 
OO Aze: [0 — 0) PAA ag 


HG0) (\a.a)t:0 
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Note that introducing +) x :0 with rule ax rather than via many, (the typing 
context x:0 is equivalent to the empty type context) would give a derivation 
with final judgement (!) (\x.a)t: 0—thus, the system messes up both indices. 
Such bad derivations of 0 are not a problem per se, because in CbNeed one 
expects correctness and completeness to hold only for derivations of non-empty 
multi types. However, they do mess up also derivations of non-empty multi types 
because they can still appear inside tight derivations, as sub-derivations of sub- 
terms to be erased; consider for instance: 
normal 


many, 


+(9:9) 7: normal 
+(9.9) 7: [normal] 
fun : 
(00) \y.T:0 —o [normal] :@ 
many. : 
H000) Ay.T : [0 — [normal] HG0) (\a.x)t:0 
HC:0) (Ay.T)((Ax.x)t) : [normal] 
The term normalises in just 1 mneea-step to I[y—(Ax.x)t] but the multiplicative 
index of the derivation is 2. The mismatch is due to a bad derivation of 0 used 


as right premise of an app rule. Similarly, the induced typing of I[y—(Ax.x)t] is 
an example of a bad derivation used as right premise of a rule ES: 


app 


—_—_——— normal . 
+ (9:9) T: normal : D 


ny i 
(0:0) 7: [normal] 7 0.0) (Ax.x)t:0 
HE0) Tiy(Ax.x)t] : [normal] 


ES 


The Actual Type System. Our solution to such an issue is to modify the system 
as to avoid derivations of 0 to appear as right premises of rules app and ES. 
We follow the schema of the rules for counting erasing steps given right after 
Theorem 2. 

Therefore, we add two dedicated rules app,, and ESgc, and constrain the 
right premise of rules app and ES to have a non-empty type. The system is in 
Fig. 4 and it is based on the same grammar of types of the naive system. Note 
that rules many and ax can still introduce 0. These Os, however, can no longer 
mess up the indices of tight derivations, as we are going to show. 

Note that the indices m and e are incremented and summed exactly as in 
the CbN and CbV type systems. 


Definition 3 (Tight derivations for CbNeed). A derivation © Pneea 
T He) t: M is tight (for CbNeed) if M = [normal] and I is empty. 


Example 4. We return to the term t := ((Av.Ay.vx)(II))(1I) used in Example 1 
and we give it a tight derivation in the CbNeed type system. 
Again, we shorten normal to n. Then, we define W as follows 
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— ax normal 
a e:M FO v: M 10,0) Agt: normal N 


T, x: NH9 t: M (Di Kee) Axt: Ladies 
fun many 
DE) Agt: N — M Weg Ty Eremi Diest) Art: [Lilies 
PEO) t: [0 — M] PR DHO): [N — M] ITE" s:N NO 8 
THe) ts:M S Py romtm’thete’) ts: M 
PE™) t:M (a) = O., Dain Kom) ts MT Em") gs N NO = 
WT Hm e) tl [zs]: M gc rw E(m+m' e+e’) txs] :M D 
Fig. 4. Type system for CbNeed evaluation. 
: (0,1) y: ax ; H01) ». aR 
x : [[n] — [n]] z:[h]— f] ae: [n] eR) si 
x : [n, [n] — [n]] KO?) zz : [n] , 
i E (1,2) o am 
x : [n, [n] — [n]] F Ay.xz : 0 — [n] may 
z : [n, [n] — [n] KEP Ay.aa : [0 — [n] i 
un 
4:2) Ax. Ayez : [n, [n] — [n]] — [0 — [n]] — 
HE2) Az. Ay-xz : [[n, [n] — [n]] — [0 — [n] 
and, shortening [n] — [n] to [n]!"!, we define © as follows 
ax — x 
z: [n P] FOD z: fn, fm] w: [P] OD w: fl 
fun normal —————_____—_ fun 
FOLD Azz: [n, [n] "] — [n, [n] P) H0) Aww sn H©D Aw.w : [n] 
many many 
FOL Az.z : [[n, [n] P] — fn, [n] +091) Aw.w : [n, [n] m] 
app 


+2) TI: [n, [n] 


Finally, we put ¥ and O together in the following derivation ® for t 


-Y me) 
HO?) da. Ay.wa: [[n, [n] — [0 — [n]]} FEP TT: fn, fn] 
H64) (Ax.Ay.xx)(II) : [0 — [n] 
HD ((Aa.Ay.va)(II)) (IL) : [n] 


app 


gc 


Note that the indices (4,4) correspond exactly to the number of mneea-steps and 
Cneea-Steps, respectively, from t to its need-normal form—as shown in Exam- 
ple 1—and that @ is a tight derivation. Forthcoming Theorem 5 shows once 
again that this is not by chance: tight derivations for CbNeed are minimal and 
provides exact bounds to evaluation lengths in CbNeed. 
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Remarkably, the technical development to prove correctness and complete- 
ness of the CbNeed type system with respect to CbNeed evaluation follows 
smoothly along the same lines of the two other systems, mutatis mutandis. 


6.1 CbNeed Correctness 


Lemma 6 (CbNeed linear substitution). Let Poneea T, 2: M Hne) B(x\):N 
and v be a value. There is a splitting M = OW P such that for any derivation 
Wrneead H HE) v:O there exists ®'Ppeeg T W I, x: PEO tm e+e’ —1) Evy): N. 


Proposition 10 (Quantitative subject reduction for CbNeed). Let 

P dreed I Hne) t: M be a derivation such that M #0. 

- Multiplicative: if t e then m > 1 and there is a derivation P' Pyeea 
id iam t: M. 

- Exponential: if t >, q5 then e > 1 and there exists a derivation ®' Pneea 
Pie) tM. 


Note the condition M # 0 in the statement of subject reduction, that is 
in contrast to the CbV system but akin to the CbN one. It is due to the way 
multi types are used as arguments, via rules ESgc and app,,. The restriction is 
necessary: the CbNeed type system derives 1) a{a+—66]:0, but a[2—d6] is 
not normalising for CbNeed evaluation. And it is expected, as it amounts to 
the fact that adequacy holds only with respect to non-empty types, as for CbN, 
and as stressed when introducing the CbNeed type system. The same restriction 
appears in Theorem 5, Proposition 13 and Theorem 6 below, for the same reason. 


Proposition 11 ([normal] typings for normal forms for CbNeed). Let 
P Pneeq T HOO) t:[normal] be a derivation, with normal(t). Then I is empty, 
and so ® is tight, and m=e=0. 


Theorem 5 (CbNeed tight correctness). Lett be a closed term. If ® Pneca 
He) t: M where M £0, then there is s such that d: t —>* q5; with normal(s), 
|d|n < M and |d|e < e. Moreover, if ® is tight then |d|n = m and |d|. = e. 


6.2 CbNeed Completeness 


Proposition 12 (Normal forms are tightly typable for CbNeed). Let t 
be such that normal(t). Then there is a tight derivation ®Pyeeq HOO t: [normal]. 


Lemma 7 (Linear removal for CbNeed). Let ® Pueea I',a:M Hime) 
E((v)): N be a derivation and v be a value, with x ¢ fy(v). Then there exist 

- a multi type M’ and two type contexts I’ and I, 

- a derivation ©! Pneeq T” Ke) v: M', and 

- a derivation Y Pneca T, x: Mw M'K) Ela): N 

such that 

- Type contexts: l = H w I”. 

- Indices: (m, e) = (m + m”, e' + e” — 1). 
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Proposition 13 (Quantitative subject expansion for CbNeed). Let 
PPneea I He) s: M be a derivation such that M #0. Then, 

~ Multiplicative: ift >m q5 then there is a derivation ®' Pneag D KOHLE) t: M, 
— Exponential: ift Saed then there is a derivation © Pneea IT HHD) t: M. 


Theorem 6 (CbNeed tight completeness). Let t be a closed term. If 


d: t —>* qS and normal(s) then there exists a tight derivation ® Dneea }(4la,|ale) 


t: [normal]. 


CbNeed Model. The interpretation [t]S>N°*4 with respect to the CbNeed system 
is defined as the set (where ¥ = (£1,..., n) is a list of variables suitable for t): 


{((M1,-.-;Mn),N) | d@Pneea z1: Mi,- , En: My HOO: N and N 40}. 


Note that the right multi type is required to be non-empty. The invariance 
and the adequacy of [t]¢?N°e¢ with respect to CbNeed evaluation are obtained 
exactly as for the CbN and CbV cases. 


7 A New Fundamental Theorem for Call-by-Need 


CbNeed Erases Wisely. In the literature, the theorem about CbNeed is the fact 
that it is operationally equivalent to CbN. This result was first proven inde- 
pendently by two groups, Maraist, Odersky, and Wadler [48], and Ariola and 
Felleisen [11], in the nineties, using heavy rewriting techniques. 

Recently, Kesner gave a much simpler proof via CbN multi types [40]. She 
uses multi types to first show termination equivalence of CbN and CbNeed, from 
which she then infers operational equivalence. Termination equivalence means 
that a given term terminates in CbN if and only if terminates in CbNeed, and 
it is a consequence of our slogan that CON and CbNeed both erase wisely. 

With our terminology and notations, Kesner’s result takes the following form. 


Theorem 7 (Kesner [40]). Let t be a closed term. 

1. Correctness: if ® Pcbn H+: L then there exists s such that d: t > ned 3) 
normal(s), |d|n < m and |d|. < e. 

2. Completeness: if d:t—*,,,8 and normal(s) then there is ®>cpn He) tnormal. 


Note that, with respect to the other similar theorems in this paper, the result 
does not cover tight derivations and it does not provide exact bounds. In fact, the 
CbN system cannot provide exact bounds for CbNeed, because it does provide 
them for CbN evaluation, that in general is slower than CbNeed. Consider for 
instance the term t in Example 1 and its CbN tight derivation in Example 2: 
the derivation provides indices (5,5) for t (and so t evaluates in 10 CbN steps), 
but ¢ evaluates in 8 CbNeed steps. Closing such a gap is the main motivation 
behind this paper, achieved by the CbNeed multi type system in Sect. 6. 
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CbNeed Duplicates Wisely. Curiously, in the literature there are no dual results 
showing that CbNeed duplicates as wisely as CbV. One of the reasons is that 
it is a theorem that does not admit a simple formulation such as operational 
or termination equivalence, because CbNeed and CbV are not in such relation- 
ships. Morally, this is subsumed by the logical interpretation according to which 
CbNeed corresponds to an affine variant of the linear logic representation of 
CbV. Yet, it would be nice to have a precise, formal statement establishing that 
CbNeed duplicates as wisely as Cb V—we provide it here. 

Our result is that the CbV multi type system is correct with respect 
to CbNeed evaluation. In particular, the indices (m,e) provided by a CbV 
type derivation provide bounds for CbNeed evaluation lengths. Two important 
remarks before we proceed with the formal statement: 

— Bounds are not exact: the indices of a CbV derivation do not generally provide 
exacts bounds for CbNeed, not even in the case of tight derivations. The 
reason is that CbNeed does not evaluate unneeded subterms (i.e. those typed 
with 0), while CbV does. Consider again the term t of Example 1, for instance, 
whose CbV tight derivation has indices (5,5) (and so t evaluates in 10 CbV 
steps) but it CbNeed evaluates in 8 steps. 

— Completeness cannot hold: we prove correctness but not completeness simply 
because the CbV system is not complete with respect to CbNeed evaluation. 
Consider for instance (Ax.I)Q2: it is CbV untypable by Theorem 4, because 
it is CbV divergent, and yet it is CbNeed normalisable. 


CoV Correctness with Respect to CbNeed. Pleasantly, our presentations of CbV 
and CbNeed make the proof of the result straightforward. It is enough to 
observe that, since we do not consider garbage collection and we adopt a non- 
deterministic formulation of CbV, CbNeed is a subsystem of CbV. Formally, if 
t > ecqs then t >p, S, as it is easily seen from the definitions (CbNeed reduces 
only some subterms of applications and ES, while CbV reduces all such sub- 
terms). The result is then a corollary of the correctness theorem for CbV. 


Corollary 1 (CbV correctness w.r.t. CbNeed). Lett be a closed term and 
ooa Ot: M be a derivation. Then there exists s such that d: t —* neds 
and normal(s), with |d|m < mMm and |dļe < e. 


Since the CbNeed system provides exact bounds (Theorem 5), we obtain that 
CbNeed duplicates as wisely as CbV, when the comparison makes sense, that is, 
on CbV normalisable terms. 


Corollary 2 (CbNeed duplicates as wisely as CbV). Let d: t -*,.u with 
normalcpy(u). Then there is d': t >*..48 with normal(s) and |d'|\n < |d|m and 


nee 
d'le < |dle. 


8 Conclusions 


Contributions. This paper introduces a multi type system for CbNeed evalua- 
tion, carefully blending ingredients from multi type systems for CbN and CbV 
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evaluation in the literature. Notably, it is the first type system whose mini- 
mal derivations—explicitly characterised—provide exact bounds for evaluation 
lengths. It also characterises CbNeed termination, and thus its judgements pro- 
vide an adequate relational semantics. 

The technical development is simple, and uniform with respect to those of 
CbN and CbV multi type systems. The typing rules count evaluation steps fol- 
lowing exactly the same schema of the CbN and CbV rules. The proofs of cor- 
rectness and completeness also follow exactly the same structure. 

A further side contribution of the paper is a new fundamental result of 
CbNeed, formally stating that it duplicates as wisely as CbV. More precisely, the 
CbV multi type system is (quantitatively) correct with respect to CbNeed eval- 
uation. Pleasantly, our presentations of CbV and CbNeed provide the result for 
free. This result dualizes the other fundamental theorem stating that CbNeed 
erases as wisely as CbN, usually formulated as termination equivalence, and 
recently re-proved by Kesner using CbN multi types [40]. 


Future Work. Recently, Barenbaum et al. extended CbNeed to strong evaluation 
[14], and it is natural to try to extend our type system as well. The definition 
of the system, in particular the extension of tight derivations to that setting, 
seems however far from being evident. Barembaum, Bonelli, and Mohamed also 
apply CbN multi types to a CbNeed calculus extended with pattern matching 
and fixpoints [15], that might be interesting to refine along the lines of our work. 

An orthogonal direction is the study of the denotational models of CbNeed. 
It would be interesting to have a categorical semantics of CbNeed, as well as a 
categorical way of discriminating our quantitative precise model from the quanti- 
tatively lax one given by CbN multi types. It would also be interesting to obtain 
game semantics of CbNeed, hopefully satisfying a strong correspondence with 
our multi types in the style of what happens in CbN [30,31,51,56]. 

A further, unconventional direction is to dualise the inception of the CbNeed 
type system trying to mix silly duplication from CbN and silly erasure from CbV, 
obtaining—presumably—a multi types system measuring a perpetual strategy. 
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Abstract. Adding predicate subtyping to higher-order logic yields a very 
expressive language in which type-checking is undecidable, making the 
definition of a system of verifiable certificates challenging. This work 
presents a solution to this issue with a minimal formalization of pred- 
icate subtyping, named PVS-Core, together with a system of verifiable 
certificates for PVS-Core, named PVS-Cert. PVS-Cert is based on the 
introduction of proof terms and explicit coercions. Its design is similar to 
that of PTSs with dependent pairs, with the exception of the definition 
of conversion, which is based on a specific notion of reduction —g., cor- 
responding to -reduction combined with the erasure of coercions. The 
use of this reduction instead of the more standard reduction —> go allows 
to establish a simple correspondence between PVS-Core and PVS-Cert. 
On the other hand, a type-checking algorithm is designed for PVS-Cert, 
built on proofs of type preservation of —g and strong normalization 
of both —g. and —g.. Combining these results, PVS-Cert judgements 
are used as verifiable certificates for predicate subtyping. In addition, the 
reduction — gz is used to define a cut elimination procedure for predicate 
subtyping. This definition provides a new tool to study the properties of 
predicate subtyping, as illustrated with a proof of consistency. 


Keywords: Higher-order logic - Predicate subtyping - Type theory - 
Proof theory 


1 Introduction 


Extending higher-order logic with predicate subtyping yields a very expressive 
type system, used notably at the core of the proof system PVS [17]. However, 
proof judgements and typing judgements become entangled in the presence of 
predicate subtyping, making type-checking undecidable. As a consequence, defin- 
ing a language of verifiable proofs for predicate subtyping becomes challenging. 
In pure higher-order logic, complete judgement derivations are too heavy to be 
used in practice as certificates, but lighter certificates can be produced by remov- 
ing typing rules, recording deduction rules only: as this approach requires the 
decidability of type-checking, it doesn’t apply directly to predicate subtyping. 
This paper presents a new formal language, PVS-Cert, designed to be used 
as a language of verifiable certificates for predicate subtyping. PVS-Cert is built 
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starting from a minimal formalization of predicate subtyping named PVS-Core, 
by adding explicit proofs and coercions. PVS-Cert is also equipped with a notion 
of cut elimination, which can be used directly to study both PVS-Cert and PVS- 
Core meta-theoretical properties. 


1.1 Extending Higher-Order Logic with Predicate Subtyping 


Higher-order logic is characterized by the coexistence of types and predicates as two 
radically different kinds of attributes to mathematical expressions. For instance, 
the mathematical expression 1 + 1 can be assigned a type Nat expressing that it 
is a natural number, or a predicate Even expressing that it is divisible by two. The 
assignment of types remains very simple: in particular, type-checking is decidable 
in higher-order logic. In return, most attributes of mathematical expressions for- 
mulated as predicates cannot be formulated as types: for instance, being a natural 
number different from 0 is expressible as a predicate, but not as a type. 

Predicate subtyping allows to recover a symmetrical situation between the 
expressivity of types and predicates. It is defined as the addition of new types, 
referred to as predicate subtypes. Given a predicate P defined on a domain A (e.g. 
Even, defined on the domain Nat), the predicate subtype {x : A | P(a)} is defined. 
An expression t can be assigned this type if and only if it can be assigned the type A 
and P(t) is provable. For instance, if Nonzero is a predicate of domain Nat express- 
ing the difference of a natural number from 0, proving Nonzero(1) allows to con- 
clude that 1 admits the type {x : Nat | Nonzero(x)}. 

This augmented expressivity of the language of types permits to exclude many 
unwanted expressions from reasoning. For instance, defining the denominators 
domain of Euclidean division as {x : Nat | Nonzero(«x)}, all divisions in which 
the denominator is not provably different from zero become ill-typed. 

As expressions may have several types, predicate subtyping induces a form 
of subtyping: for instance, as any expression of type {x : Nat|Nonzero(x)} also 
admits the type Nat, the former can be considered as a subtype of the latter. 

As previously mentioned, a major counterpart of this extension of higher-order 
logic is the fact that typing judgements and proof judgements become entangled. 
For instance, proving the equality (1/1) = 1 requires that 1 can be assigned the 
type {x : Nat|Nonzero(x)}, which, in turn, requires to prove Nonzero(1). As a 
direct consequence, type-checking is not decidable in the presence of predicate sub- 
typing. 


1.2 Contributions 


PVS-Core. Higher-order logic, as well as its extension with predicate subtyp- 
ing, can be defined in various ways. The first contribution of this paper is the for- 
malization, in Sect. 2, of a minimal system for predicate subtyping, denoted PVS- 
Core. Besides its minimality, the main design choice for this system is the use of 
-equivalence as a conversion relation (or definitional equality). 
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PVS-Cert and Its Basic Properties. Starting from PVS-Core, the second 
contribution of this work is the formalization, in Sect.3, of a language of veri- 
fiable proofs for PVS-Core. This new language, denoted PVS-Cert, is designed 
from PVS-Core with the addition of explicit proof terms, formalized as \-terms, 
as well as the addition, at the level of expressions, of explicit coercions based on 
these proof terms. The addition of explicit proof terms follows the Curry-Howard 
isomorphism in the sense that PVS-Cert proofs terms are typed by their corre- 
sponding formulas. 

PVS-Cert is an extension of the Pure Type System (PTS) A-HOL (see for 
instance [4], where A-HOL as well as the general notion of PTS are defined). More 
precisely, PVS-Cert is designed to extend A-HOL in the same way that PVS-Core 
extends higher-order logic (denoted HOL in the following). This situation is illus- 
trated in this diagram, where vertical arrows represent extensions and horizontal 
arrows represent the introduction of explicit proofs (and, in the case of PVS-Core 
and PVS-Cert, of explicit coercions). 


PVS-Core PVS-Cert 


| 


HOL > \-HOL 


This choice of a PTS-like system is well-suited to describe reasoning modulo 
6: all steps of G-reduction or -expansion are kept implicit in proof terms, which 
allows to keep them compact. As detailed in Sect. 3.3, PVS-Cert is comparable to 
the formalism of PTSs with dependent pairs. However, conversion in PVS-Cert 
is neither defined as =, nor as its extension =,g, (see for instance [16]) used in 
PTSs with dependent pairs: instead, it uses a new conversion relation =,, corre- 
sponding to syntactical equality modulo 3-reduction and coercion erasure (defined 
in Sect. 3.1). This distinctive definition allows to define a simple correspondence 
between PVS-Core and PVS-Cert — presented later in Sect. 9. 

Basic properties of PVS-Cert are presented in Sect. 4, containing notably the 
Church-Rosser property for the reduction —g, underlying the conversion =,,, as 
well as the uniqueness of types: contrary to the case of PVS-Core, a well-typed 
term admits a unique type up to =,.. 

As in A-HOL, well-typed terms are organized according to a stratification, pre- 
sented in Sect. 5, which includes a class of types, a class of expressions (containing 
notably propositions), and a class of proof terms. This stratification is at the core 
of the correspondence between PVS-Cert and PVS-Core. 


Type Preservation and Strong Normalization. In contrast to the case of 
the reduction —g, in PTSs with dependent pairs, >g is not a type preserving 
reduction in PVS-Cert. We prove however in Sect. 6 that > go is a type preserving 
reduction in PVS-Cert (Theorem 6). 

In Sect. 7, we present the main ideas leading to a proof of strong normalization 
for both +g, and >, (Theorem 7) — the details of the proof can be found in the 
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author’s PhD dissertation [1]. Moreover, the strong normalization of the type pre- 
serving reduction —>go defines a cut elimination theorem (Theorem 8). This theo- 
rem is used in the remainder of this section to prove the consistency of PVS-Cert. 
This result is used in turn at the very end of this work to conclude the consistency 
of PVS-Core, illustrating how cut elimination in PVS-Cert can be used to study 
the meta-theoretical properties of predicate subtyping. 


Type-Checking in PVS-Cert. We present in Sect.8 the design of a type- 
checking algorithm for PVS-Cert, showing that, contrary to the case of PVS-Core, 
type-checking is decidable in PVS-Cert. This algorithm is based on the type preser- 
vation of +g, as well as the strong normalization of >g, and —g,. 


Using PVS-Cert as a System of Verifiable Certificates for PVS-Core. 
The connection between PVS-Core and PVS-Cert is formalized in Sect. 9. On the 
one hand, a translation from PVS-Cert to PVS-Core is defined through the era- 
sure of coercions. On the other hand, the choice of conversion =g in PVS-Cert 
allows to define a very simple translation from PVS-Core derivations to PVS-Cert 
derivable judgements (Definition 7 and Theorem 11). 

These translations are used in Sect.10 together with the PVS-Cert type- 
checking algorithm to define how to use PVS-Cert judgements as verifiable cer- 
tificates for PVS-Core, reaching the first purpose of this paper. Such certificates 
are much lighter than the PVS-Core derivations represented through them, as they 
only require to record one single judgement. 

Last, the translations between PVS-Core and PVS-Cert are exploited to trans- 
pose the consistency property, established in PVS-Cert using cut elimination, to 
PVS-Core. This illustrates how the PVS-Cert cut elimination theorem can be used 
to study both PVS-Cert and PVS-Core meta-theoretical properties. 


1.3 Related Works 


The most important related work is the author’s PhD dissertation [1], which con- 
tains detailed versions of all proofs presented in this paper. 

The introduction of predicate subtyping can be traced back to the first-order 
language OBJ2 [9] and its sort constraints, allowing to restrict some typing rela- 
tions to the satisfaction of a predicate. This idea was later refined and combined 
with higher-order logic in the proof system PVS, which is one of the most impor- 
tant systems based on predicate subtyping. Overviews of the PVS specification 
language and its use of predicate subtyping are given for instance in [17] and [20]. 

In the present work, the issue of the undecidability of predicate subtyping is 
handled with the introduction of an alternative system, PVS-Cert. An alternative 
approach to this issue is to weaken the definition of predicate subtyping sufficiently 
to obtain systems in which type-checking remains decidable. This approach has 
been followed in [13, 19]. A intermediary situation is followed in [15], in which pred- 
icate subtyping is weakened sufficiently to allow for run-time type-checking veri- 
fications. However, contrary to the case of PVS, predicate subtyping is not fully 
represented in these different systems. 


444 F. Gilbert 


As mentioned in the previous section, PVS-Cert is an adaptation of the formal- 
ism of Pure Type Systems (PTSs) — sometimes also referred to as Generalized Type 
Systems (GTSs) —, presented for instance in [4]. The definition of PTSs is itself the 
result of several successive works, including notably [3,7,11, 24-26]. More specif- 
ically, PVS-Cert is derived from the notion of PTSs with dependent pairs, which 
has its roots in the system ECC [16]. A subsystem of PVS-Cert, named PVS-Cert~ 
and presented in Sect. 3, corresponds directly to a fragment of ECC (PVS-Cert7 is 
the system obtained from PVS-Cert by replacing =g, by the standard conversion 
=o of PTSs with dependent pairs). PVS-Cert~ is also comparable to the notion 
of subset types in Coq [5]. However, contrary to PVS-Cert, PVS-Cert~ and sub- 
set types are not well-suited to reflect predicate subtyping, as conversion in these 
systems does not reflect conversion in PVS-Core — more precisely, Proposition 5 
doesn’t hold with =,,. 

Another important related work is [8], in which two systems are presented: 
ICCy, a type system with implicit type constructions, and AICCy, a system 
obtained from ICC s by adding explicit coercions. ICC x contains several advanced 
features, including a generalization of predicate subtypes. The construction of 
PVS-Cert from PVS-Core follows the same idea as the construction of AICC 5 
from ICC y: adding the missing information explicitly in the terms of the language 
to recover the decidability of type-checking. The main difference between the two 
approaches lies in the complexity of the respective languages. ICC y is a very rich 
and complex language, making its analysis difficult — in particular, strong normal- 
ization in ICC is kept as a conjecture, on which the decidability of type-checking 
itself relies. Conversely, PVS-Core is designed as a minimal language including 
predicate subtyping, making its analysis simpler. 

A variant of predicate subtyping was also formalized as an extension of the cal- 
culus of constructions in [22]. As in the present work, this presentation contains 
two systems connected with each other. On the one hand, it includes one system, 
named Russell, which is comparable to a weakened version of PVS-Core in which 
aterm t of type A admits the type {x : A | P} even when P{t/z] is not provable. 
In this variant of predicate subtyping named subset equivalence, type-checking is 
decidable. On the other hand, this work includes a system with explicit coercions 
which is comparable to PVS-Cert. Contrary to PVS-Core, Russell derivations are 
not intended to contain all information necessary to build complete terms with 
explicit coercions: instead, a translation producing incomplete terms in the sys- 
tem with explicit coercions is presented. This system allows to write programs and 
specifications together in Russell, and to prove their correctness in a second step 
by filling all proof holes produced through the translation, in a way which is similar 
to the functioning of PVS. 

Contrary to the case of PVS-Core and Russell, PVS-Cert and the counterpart 
of Russell with explicit coercions have similar characteristics. Although its theo- 
retical properties are not formalized, this latter system is presented as a simple 
extension of the proof-irrelevant type theory presented in [27]. There exists indeed 
a tight connection between proof irrelevance and PVS-Cert: if one considers for 
instance the usual predicate Even on natural numbers expressing divisibility by 
two, the predicate subtype even = {x : Nat | Even(a)}, and two expressions 
with explicit coercions (2, p) even and (2, q) even of this type with p and q two proofs 
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of Even(2), then the hypothesis of proof irrelevance ensures that the expressions 
(2, DP) even and (2,q) even are convertible, as does the choice of conversion relation 
=z. in PVS-Cert. 

This relation between proof irrelevance and predicate subtyping is explored 
further in [27]. Besides the fact that this work is based on the calculus of construc- 
tions and besides some technical differences in the precise definition of conversion 
between the system presented in this paper and PVS-Cert, analyzing the strong 
relation between these two systems appears as a very interesting future work. In 
particular, it would provide a possible strategy for building a proof of strong nor- 
malization for this system from the proof of strong normalization presented in 
Sect. 7. Also following the relation between proof irrelevance and predicate sub- 
typing, the system IITT presented in [2], which is equipped with explicit occur- 
rences of irrelevant terms, also admits some similarities with PVS-Cert. However, 
it is restricted to predicative type theory, in which higher-order reasoning cannot 
be expressed. 

Another important work carried out on predicate subtyping is the presenta- 
tion of a formal semantics for PVS in [18]. This work defines, for some fragment 
of the PVS language including predicate subtyping but also other features such 
as parametric theories, set-theoretical interpretations of types and expressions. 
These interpretations are limited to standard interpretations: the interpretation of 
a function type is the set of all functions from the interpretation of the domain to 
the interpretation of the co-domain, and the interpretation of the type of propo- 
sitions is a set containing exactly two elements, distinguishing true propositions 
from false ones. Such an approach is complementary to the presented paper, which 
is only focused on the distinction between provable propositions and unprovable 
ones. As a possible future work, it would be interesting to adapt the work presented 
in [18] to obtain a notion of standard model for PVS-Core. 


2 PVS-Core: A Minimal Extension of HOL with Predicate 
Subtyping 


This section is dedicated to the first contribution of this work: the formalization 
of a minimal system for predicate subtyping. This system is named PVS-Core, 
in reference to PVS [17]. The main distinctive design choice for PVS-Core is the 
introduction of a conversion relation (or definitional equality), corresponding to 
(B-equivalence. 


2.1 Definitions 


Variables and Terms. We first define a set of variables V as the disjoint union 
of two infinite countable sets of symbols Veapressions and Viypes. We introduce the 
generic notation v or w to refer to a variable in general, as well as the following 
specific notations: 


— The notation X or Y refers to variables in Viypes. 
— The notation z or y refers to variables in Veapressions: 
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Then, we define a set of terms as the disjoint union of the three following sets. 
The last two are defined together recursively. 


— The first set contains a unique symbol: Type. 

— The second set is the set of types. It is given with the following grammar: 
A, B := X | Prop | Hx : A.B | {x : A | P} 

— The last set is the set of expressions. It is given with the following grammar: 
t, u, P,Q := x | Yx: A.P | P > Q | Az: At | tu 


Remark 1. There is no formal distinction between the expressions denoted t or u 
and the expressions denoted P or Q, as all of them refer to expressions in general. 
Yet, in the following, the notations P and Q will be often used to refer to expres- 
sions admitting the type Prop, also referred to as formulas or propositions. 


Declarations, Contexts, Judgements. We define: 


— Three kinds of declarations: 
X:Type|x:A|P 

— Contexts, denoted I’, as lists of declarations: 
I := |T, X :Type|T,x:A|T,P 

— Four kinds of judgements: 
CF WF|DFA:Type| Ft: Al LEP 


We use the notation DV (T) to refer to the set of variables declared in a context 
I: for instance, DV(P,a: A, X : Type) = {x,X}. 


Reduction. We equip PVS-Core terms with the usual G-reduction. In the fol- 
lowing, we use the notation >g for the reduction of a G-redex, —g for the context 
closure of >g, >g for the reflexive transitive closure of —g, and =g for the sym- 
metric closure of —»g, i.e. -conversion. 


Derivation Rules. The rules of PVS-Core are the following: 


Well-formed contexts 


EMPTY Pr Wr DV(T) TYPEDECL 
BWE Ex uer We = eye) i 
TH P: Prop ILA: Type 
5 expressions D I ELTDE L 
T PE WF ASSUMPTION Tz. A WF” Vexp \DV(L) c 


Well-formed types 


CL WF CL WF 
(XK: T I’ TYPEVA — 
IF X : Type ( ype) € VAR I FĀ Prop: Type PROF 
T,x: AF B: Type T,x: AF P: Prop 
I SUBTYPE 
T+ a: A.B: Type Tt {x:A|P}: Type 
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Well-typed expressions 


Tt WF Tet: {x:A]| P} 
x: A) €I ELTVAR 
TFz.A ( ) PEGA SUBTYPEELIM1 
I,x:Att:B L Ct: Tax: A.B Thku:A APP 
TF Az: At: fe: AB “A TF tu: B[u/x] 
T,x: AF P: Prop T,P H} Q: Prop I 
Petar Pre ee TFPSQ:Pro "PY 
Trt:A T+ P{t/z] Tt {x:A|P}: Type I 
FE GAAP: SUBTYPEINTRO 
Trt:A IF B: Type 
A=g B TyYPECONVERSION 
Trt:B j 
Deductions 
T+ WF THER Tr HQ: Prop = 
Pee P eI Axiom TFQ P =g Q PROPCONVERSION 
T,PFQ j i r- -PsQ PER a "a 
TPSO MPLYĪNTRO T-O MPLYELIM 
T,x: AFP F I DPEVa:A.P THEA PORALLELM 
TENGA, P eee TF Plt/a] 
Trt: {x:A]| P} S ENS 
TE PE] UBTYPEELIM 


2.2 A Minimal System Expressing Predicate Subtyping 


Predicate subtyping is expressed in PVS-Core with the term construction {x : A | 
P} and the following rules: 


— SUBTYPE, the rule of formation of predicate subtypes. 
— SUBTYPEINTRO, which is a rule of introduction. 
— SUBTYPEELIM1 and SUBTYPEELIM2, which are rules of elimination. 


The system obtained from PVS-Core by removing the construction {a : A | P} 
and these four rules is a formulation of constructive higher-order logic. In partic- 
ular, the types of this subsystem correspond to the expected simple types: for any 
type of the form Hg : A.B in this subsystem, x cannot appear free in B, hence this 
type is a non-dependent function type. As a consequence, the rule TyPECONVER- 
SION can be safely removed from this subsystem to obtain a simpler but equivalent 
formulation of higher-order logic. 

PVS-Core is a minimal constructive system, which can be extended with clas- 
sical reasoning or extensionality principles through the addition of axioms. 

The rule PROPCONVERSION allows to consider reasoning modulo 3, which will 
be useful in the definition of PVS-Core to keep proof terms compact. The rule 
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TYPECONVERSION is its counterpart at the level of types, allowing to consider 
typing modulo 8 as well. 


3 PVS-Cert: Verifiable Certificates for PVS-Core 


This section is dedicated to the presentation of an alternative system, PVS-Cert, 
which will be used to achieve the purpose of the work: defining a language of veri- 
fiable certificates for predicate subtyping. 

At first glance, there is no need to introduce any new system to design PVS- 
Core certificates: the language of PVS-Core derivations itself is a language of veri- 
fiable proofs for PVS-Core. However, this language is heavy as many parts of PVS- 
Core derivations contain unnecessary or redundant information. As a comparison, 
in higher-order logic, as type-checking is decidable, only the deduction rules need 
to be recorded. 

The main idea in the definition of PVS-Cert as a language of certificates for 
predicate subtyping is to formalize proofs as new kinds of terms, in addition to 
the types and expressions which are already present in PVS-Core, and to intro- 
duce explicit coercions based on these proof terms in order to ensure the decidabil- 
ity of type-checking. As a consequence, a complete certificate is simply the typing 
judgement of some proof term with its corresponding theorem. Such certificates are 
much lighter than PVS-core derivations, as only one single judgement is recorded. 

Moreover, PVS-Cert will be equipped (in Sect. 7) with a definition of cut elim- 
ination, defined as a computation rule on proof terms. 


3.1 Definitions 


As detailed further in Sect. 3.2, the definition of PVS-Cert is strongly related to 
the formalism of PTSs, presented for instance in [4]. 


Terms. We define: 


— Sorts S = {Prop, Type, Kind} 

We use the notation s to refer to a sort. 

- Axioms A = {(Prop, Type), (Type, Kind) } 

— Rules R = {(Prop, Prop, Prop), (Type, Type, Type), (Type, Prop, Prop)} 

— Variables The set of variables V is the disjoint union of three infinite countable 
sets of symbols Vproofs, Vexpressions; and Viypes- The sets Verpressions ANd Veypes 
refer to their respective definitions in PVS-Core, while the set Vproofs is new. 
We use the notation v to refer to a variable and s(v) to refer to the unique sort 
s such that v € V3. 

— Terms 7 is given by the following grammar: 

M,N,T,U := s | v | wv: T.M | MN | Iv : TU | {v:T | U} | (MN) | 
m1(M) | m2(M) 
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Contexts, Judgements. We define: 


— Contexts I := Ø | T,v:T 
— Judgements [+ WF | TAFM:T 


As in PVS-Core, set of variables declared in a context I is denoted DV (T). 


Reduction. The main specificity of PVS-Cert is the use of a distinctive notion 
of reduction and conversion. In addition to the usual -redex reduction (Av 
T.M)N bg M[N/v], we introduce a new reduction relation >,, defined with the 
following rules: 


a (Mı, Mə)Tr Dx Mı 
= m™(M) p>, M 


We denote the union of >g and >, as >g,. As in the definition of PVS-Core, we use 
the notation — g, for the context closure of >g, >g for the reflexive transitive 
closure of >g, and =g, for the symmetric closure of —> g4. 

The new relation œx, which can be interpreted as the elimination of a coercion 
at the head of a term, allows the expression of predicate subtyping in PVS- 
Cert. More detailed motivations and justifications for this definition are given in 
Sect. 3.3. 


Derivation Rules. The rules of PVS-Cert are defined as follows: 


wear EMPTY pits v € V\DV(T) DECL 
rH WF PEM:T TEU iS me 
ET I VA T =g. U CONVERSION 
Tror CD) VAR TFM:U 8 
rH WF 
e T 
er (s1,52) E A SOR 
TFT: 81 Tyju:TEU: 82 


TF lv: T.U: s3 (51,82,53) ER PROD 


T, v:T-M:U TENU TUS, 

TF w:TM : iv: TU ao 
PEM: Hv: TU PEN:T App 

TF MN :U[N/¥] 
CELT: Type T,v:T HU: Prop g 

BTYPE 

Tt {v:T|U}: Type G 

CEM:T CEN :U[M/2] Tt {v:T|U}: Type |, 
AIR 
DPE (M, N)w:ruz:{w:T]|U} 

r -M:{v:T|U} r-M:{v:T|U} 


Traan r 9 TF (M): Um Mo] | O? 
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3.2 An Extension of A-HOL 


PVS-Cert is an extension of the PTS A-HOL (see for instance [4]). More precisely, 
A-HOL can be obtained from PVS-Cert by removing the term constructions {v : 
T | Uy, t;(M), and (M, N)-7, removing the rules SUBTYPE, PAIR, PROJ1, and 
PROJ2, and replacing =g by =g in the CONVERSION rule. 

As PTS-like systems, the formalism of PVS-Cert allows to describe reasoning 
modulo (: all steps of G-reduction or G-expansion in reasoning are kept implicit, 
which allows to keep proof terms compact, making PVS-Cert more scalable. More- 
over, the choice of formalization of PVS-Cert as a PTS-like system allows to trans- 
pose some PTS properties to PVS-Cert, such as the thinning property and the 
substitution property mentioned in the next section. It also allows to describe this 
system using a small number of rules in comparison with PVS-Core, making the 
proof of certain expected properties of PVS-Cert lighter. 

The well-typed terms of PVS-Cert are classified into the same classes as in the 
case of A-HOL, involving a class of types, a class of expressions, and a class of proof 
terms. This property is presented in Sect. 5, and referred to as stratification. 


3.3 Expressing Predicate Subtyping 


The expression of predicate subtyping in PVS-Cert is enlightened through the 
stratification: indeed, in any derivable judgement, 


— terms of the form {v : T | U} are types, expressing predicate subtypes 

— terms of the form (M, N) or 7,(M) are expressions, and correspond respec- 
tively to explicit coercions going from a type to one of its predicate subtypes 
and back 

— terms of the form 72(M) are proofs, expressing the PVS-Core deduction rule 
SUBTYPEELIM2. 


As mentioned in the introduction, this formalism used to express predicate sub- 
typing is very similar to the formalism of dependent pairs, used for instance in 
the type system ECC [16]. More precisely, the terms {v : T | U} are compara- 
ble with types of dependent pairs (usually denoted Xw : T.U), the terms (M, N)r 
are comparable with dependent pairs, and the terms m;( M) are comparable with 
projections. 

The only difference between PVS-Cert and the formalism of dependent pairs 
lies in the choice of conversion =g,: in the case of a system with dependent pairs, 
=a. is replaced by the more standard conversion =g,. This conversion is defined 
from the usual reduction 7; (M41, M2}r >o Mi. We define the relations Dgo, >80, 
— go, and =gq in a similar way to the definitions of >g., >x, ~g+, and =,.. 

Applied to types or expressions, the conversion =g, includes the more standard 
conversion =g, (this property is a direct consequence of Theorem 5 together with 
the Church-Rosser property of > go ). However, this inclusion is strict: for instance, 
it is not difficult to find two well-typed terms (M, Ni) and (M, N2)r which are 
not convertible using =g,, although they are convertible using =,.. 
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As a direct consequence of this property, PVS-Cert is an extension of the system 
obtained from it by replacing =g. by =g., and this extension is strict. In this paper, 
this subsystem will be referred to as PVS-Cert~. It isa PTS with dependent pairs, 
and corresponds more precisely to the system obtained from the PTS A-HOL by 
adding the single dependent pair rule (Type, Prop, Type). It is strictly included in 
the type system ECC presented in [16]. 

An mentioned in the introduction, this choice of a strictly more flexible con- 
version allows to define a very simple translation from PVS-Core derivations to 
PVS-Cert derivable judgements. Indeed, using =g, ensures that two PVS-Cert 
types (resp. expressions) are convertible as long as the corresponding types (resp. 
expressions) in PVS-Core are also convertible, which allows to define a very simple 
translation from PVS-Core derivations to PVS-Cert derivable judgements (Defi- 
nition 7 and Theorem 11). 

The reduction — g, underlying conversion does not preserve typing: for 
instance, the judgement « : Prop,h: at (x, h}r : T with T = {y : Prop | y} 
is derivable, and (z,h)r >g» x, but x : Prop,h: x + x : T is not derivable. 
However, as presented in Sect.6, the reduction —g, is type preserving, and will 
be used both as a definition of cut elimination for PVS-Cert proofs (Sect. 7) and 
in the definition of a type checking-algorithm (Sect. 8). 


4 Properties of PVS-Cert 


One of the most important properties satisfied by PVS-Cert is the Church-Rosser 
property. 


Theorem 1 (Church-Rosser for —,.). Whenever Mı =g, Mo, there exists N 
such that Mı >g N and Mz >g, N. 


Proof. T equipped with —, is an orthogonal combinatory reduction system (as 
defined in [14]), as rules are left-linear and non-overlapping. As proved in [14], such 
a system admits the Church-Rosser property. 


In the case of PTSs, the Church-Rosser property of —g is at the core of the 
type preservation of — g. In the case of PVS-Cert, the situation is different, as 
— gx is not a type preserving reduction. However, in a first step, the Church-Rosser 
property of — g, will be used to establish the expected stratification theorem, pre- 
sented in Sect. 5. In a second step, the Church-Rosser property of > g+ will be used 
again together with the stratification theorem to establish the type preservation of 
an alternative reduction, —g,, used both as a definition of cut elimination (Sect. 7) 
and at the core of the definition of a type-checking algorithm (Sect. 8). 

Another important property of PVS-Cert used to design a type-checking algo- 
rithm is the uniqueness of types modulo conversion. As presented in Sect. 8, this 
property allows — together with the decidability of =g, on well-typed terms — to 
reduce the problem of type-checking to a problem of type inference. This property 
also underlines the fact that, even though PVS-Cert is designed to reflect predicate 
subtyping, it doesn’t admit any subtyping itself. The proof of type uniqueness is 
standard, and does not involve any specific difficulty. 
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Theorem 2 (Uniqueness of types). If two judgements T F M : To and T H 
M : T; are derivable, then To =gx Tı. 


PVS-Cert also satisfies several other standard properties expected from PTSs 
and PTSs extended with dependent pairs, among which thinning and substitution, 
described for instance in [4], as well as context conversion, described for instance in 
[21], which is based on the extension of conversion to contexts. In these three cases, 
the corresponding proofs are straightforwardly adapted from the case of PTS. 

We end this section with the following important theorem, which also holds in 
A-HOL. The proof is adapted from the case of A-HOL and does not involve any 
specific difficulty. 


Theorem 3. If [+ M: T is derivable and T #4 Kind, there exists a sort s such 
that T: s. 


5 Stratification in PVS-Cert 


The stratification of terms in PVS-Cert reveals a strong link between PVS-Cert 
and PVS-Core (defined in Sect. 9), in the same way that the stratification of terms 
in A-HOL reveals its link with higher-order logic. The property of stratification 
holds for several other systems, such as the injective PTSs presented in [11]—in this 
paper, PTSs are referred to as GTSs, and this result is referred to as classification. 

The main lemma used to establish such a result is the fact that, whenever the 
rule of conversion is used in some derivation, the two terms involved in the con- 
version belong to the same class of terms. The simplest way to prove this result is 
to choose classes of terms that are stable under reduction and to conclude using 
the Church-Rosser theorem. In the case of injective PTSs, these classes are specific 
classes of well-typed terms, and the stability under reduction follows from the type 
preservation of >g. 

However, as mentioned in Sect. 3.3, type preservation does not hold for >, in 
PVS-Cert. For this reason, we will choose a relaxed definition of stratified terms, 
where the different classes are not restricted to well-typed terms. Using this relaxed 
definition, it will be possible to prove, even in the absence of type preservation for 
— gx, that most classes of stratified terms are stable by reduction with — gx. 

We first present three classes of terms: types, expressions, and proofs. The 
expected property of stability by reduction will only be proved for types and 
expressions (Proposition 1), which is not problematic as the conversion rules are 
never directly applied to proofs in valid derivations. 


Definition 1 (Variables stratification). We introduce the notations: 


- X,Y, Z for variables in Viypes 
- x,y, 2 for variables in Vexpressions 
— h for variables in Vproofs 
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Definition 2 (Stratified terms). We define stratified terms as follows. 


- Types A,B := X | Prop | Hx : A.B | {x : A | P} 
- Expressions 

t, u, P,Q := x | Hx : A.P | Wh: P.Q | àx : A.t |tu | (t, Mya | T(t) 
- Proofs p,q := h | Ah : Pp | Av: A.p| pq |p t] r(t) 


Remark 2. As in the case of PVS-Core (Remark 1), there is no formal distinction 
between the notations t, u, P, and Q although, in the following, the notations of 
expressions P, Q will be preferred for expressions of type Prop. 


The most important remark on the definition of stratified terms is the fact that 
any pair (t, M) a (where t is an expression and A is a type) is accepted as a cor- 
rect expression: the term M used in it can be arbitrary, and in particular it is not 
required to be a proof term. This choice is due to the fact that proofs are not sta- 
ble by + gx: for instance, (Ah : x.h)y is a proof, but y is not. Hence, compared to 
the alternative of restricting pairs to terms of the form (t, p) 4, the present relaxed 
definition is necessary to ensure the stability of types and expressions under — ,.., 
which is formalized in the following proposition — the proof does not involve any 
specific difficulty, as the definitions of types and expressions are designed to satisfy 
this property. 


Proposition 1. Whenever M —,, N and M is a type (resp. an expression), so 
is N. 


Beyond its use in the proof of the stratification theorem (Theorem 4), this sta- 
bility property is also directly useful in the proof of the strong normalization the- 
orem for —g, and —g,, as briefly mentioned in Sect. 7. 

Finally, we present the expected stratification theorem, based on the following 
definitions. 


Definition 3 (Stratified contexts, stratified judgements). We define 


— stratified contexts as contexts in which all declarations have the form X : 
Type, x : A (for some type A), orh: P (for some expression P). 

— stratified judgements as judgements of one of the following forms, in which 
T is a stratified context: 


I+ WF I F Type : Kind 
ILA: Type Trt:A 
Ir p:P 


Theorem 4 (Stratification). Any derivable judgement is stratified. 


Proof. The proof is straightforward by induction on the derivation. In the case 
of CONVERSION, Proposition 1 and the Church-Rosser property of —g, are used 
together to conclude that the two convertible terms are either both expressions, 
both types, both Type, or both Kind. Basic stability properties of types and 
expressions under substitution are also involved in the cases PROJ2 and APP. They 
are proved directly by induction. 
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6 A Type Preserving Reduction 


Contrary to the case of PTSs (resp. PTSs with dependent pairs), in which —, 
(resp. +g.) is a type preserving reduction, >g is not a type preserving reduc- 
tion in PVS-Cert. Instead, we present in this section the type preservation of the 
reduction —g, in PVS-Cert. This reduction will be used both as a definition of 
cut elimination for PVS-Cert proofs (Sect. 7) and in the type-checking algorithm 
(Sect. 8). 

The specificity of this proof of type preservation compared to similar results 
for PTSs lies in the fact that M —g, N does not imply M =p N in general. 
However, this implication always holds if M is either a type or an expression — the 
corresponding proof involves no particular difficulty. 


Theorem 5. Whenever M 3, N and M is a type (resp. an expression), so is 
N, and M =z. N. 


Finally, the type preservation theorem for —g, is the following. 


Theorem 6. Given a derivable judgement 0+ M : T, and N such that M —>go 
N, the judgement + N : T is derivable. 


Proof. The proof is done by induction on the derivation. The situations where 
M ao N and the cases where M bgo N are separated. We present here one case 
for each situation — the full proof can be found in the author’s PhD dissertation [1]. 


— We illustrate the situation where M %go N with the case of the rule PROD, 
which involves Theorem 5. Discarding the notations of the original statement, 
we describe the last inference step with the following new notations: 


TFT: 8, Tiv: TEU: 89 
EEIMv: TU : s3 


If the reduction occurs in U, we conclude directly by induction hypothesis. If the 
reduction occurs in T, we write T +g, T’. By induction hypothesis, [+ T’ : sı 
is derivable. By the stratification theorem, v € V,,, hence T,v : T” + WF is 
derivable using the DECL rule. By the stratification theorem and Theorem 5, 
T =g, T’. Hence, using the second premise and context conversion (mentioned 
in Sect. 4), T, v : T” F U : sq is derivable. Finally, using PROD, I H Hv: T’.U: 
83 is derivable. 

— We illustrate the situation where M pg, N with the case of the rule PROJ1. As 
M is a first projection and Mpg, N, M is a o-redex. We replace the notation 
M and T of the original statement by 71(M, N)r>go M and T”. In this setting, 
the last inference step has the following form: 


TE (M,N): {v:T' | U'} 
De m(M,N)r:T’ 
Analyzing the derivation of the premise (and more precisely the last rule differ- 


ent from CONVERSION used in it, which is necessarily PAIR), we conclude that 
T has the form {v : T” | U”} where {v : T’ | U'} =e. {v : T” | U”} and 


(s1, 82,83) ER PROD 


Prosl 
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DE (M,N)r: {v: T” | U”} admits a derivation ending with an inference step 
of the form 
TFM:T" CPEN:U"|M/v] Tk {v : T” |U"}: Type 

DE (M,N): {u:T” | U"} 


We derive the expected judgement l H M : T’ from the first premise of this 
latter derivation using conversion. For this, we need to prove T” =p, T’ and 
to derive [+ T’ : s for some s. These two requirements are proved as follows. 
On the one hand, we establish T” =g, T’ from {v : T” | U"} =,. {uv : T' | 
U’} using the Church-Rosser property (Theorem 1). On the other hand, by the 
stratification theorem, T” 4 Kind, hence we can use Theorem 3 on the original 
conclusion to establish that [+ T” : s is derivable for some sort s, as expected. 


PAIR 


7 Strong Normalization and Cut Elimination 


This section is dedicated to the strong normalization of both — go and —g, on well- 
typed PVS-Cert terms. These two reductions will be used separately in Sect. 8 to 
define a type-checking algorithm for PVS-Cert: more precisely, the reduction — gx 
is used to decide whether two well-typed terms are convertible with =,., while 
the type preserving reduction —g, will be used in the type-checking of applica- 
tions. Moreover, the strong normalization of +g, combined with its type preser- 
vation property provides a cut elimination theorem, which is a powerful tool to 
study properties of both PVS-Cert and PVS-Core. Its use is illustrated in a proof 
of consistency of PVS-Cert (Theorem 9), used in turn to establish the consistency 
of PVS-Core (Theorem 12) at the end of this paper. 


7.1 Strong Normalization 


A direct approach to prove the strong normalization of +g, and —g. for well- 
typed terms would be to prove the strong normalization for well-typed terms of 
their union, referred to as >g,,. Unfortunately, this reduction is not strongly ter- 
minating on well-typed terms, as shown in the following proposition. 


Proposition 2. There exists a well-typed term admitting an infinite reduction 
USING — Box: 


Proof. We first define two well-typed terms M and N such that MN admits an 
infinite reduction. It is simple to find two such terms, using the fact that PVS-Cert 
is an extension of System F [12]. For instance: 


— We take T = HP : Prop.IIh : P.P together with M = Ah: T.h T hand 
N=dAh': TAL: TAT hA 

— M admits the type Hh : T.T and N admits the type Th’: T.h: T.T. 

— MN admits an infinite reduction MN —g,. N T N >go% MN ox « 


Using these terms, we build the expected counter-example of normalization of 
— Box as follows: 
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— We define N’ = AP : Prop.Ah: P.h, T = {x : Prop | HK : T.h: T.T}, and 
U={y:T|Th. 

— It is straightforward to show that M 72((T, N)r, N’)y admits the type T. 

— M m2((T,N)r, N’)u pox MN, hence it admits an infinite reduction. 


Because of Proposition 2, we keep the expected strong normalization theorem 
in PVS-Cert formulated as follows. 


Theorem 7 (Strong normalization). For any derivable judgement r + M :T, 
M is strongly normalizing under both >g, and > gx: 


- any reduction sequence starting from M and using >g, terminates 
— any reduction sequence starting from M and using —g, terminates 


The proof of this theorem is left out of the scope of this paper. It is detailed in 
the author’s PhD dissertation [1]. We simply highlight here some of its specifici- 
ties, which illustrate the consequences of the choice, in PVS-Cert, of a conversion 
relation which is not based on a type-preserving reduction. 


— The proof uses Tait’s approach based on saturated sets (see for instance [23]). 
However, only one single notion of saturated set is used: saturated sets are 
defined here as specific subsets of the set of terms which are both strongly nor- 
malizing under —g, and strongly normalizing under —,,. As a consequence, 
compatibility properties for such saturated sets must be proved with respect to 
both reductions. 

— Following Tait’s approach, an interpretation function is defined in order to 
prove that, whenever term M admits a type T, it belongs to the interpretation 
of T, which is the main theorem established to conclude strong normalization. 
The definition of this function is inspired from the definitions of Girard in [12] 
for the strong normalization of F” — which corresponds to A-HOL without type 
declarations —, but several ideas are also taken from [10], which presents, among 
other things, a proof of strong normalization of an extension of the calculus of 
constructions with dependent pairs. 

— As the interpretation function is expected to be stable under —,, its domain 
cannot be restricted to well-typed terms only, as well-typed terms are not stable 
under —g,. For this reason, it is chosen to define this interpretation function on 
the classes of types and expressions, as presented in the definition of stratified 
terms (Definition 3): indeed, this specific definition, which uses arbitrary terms 
instead of proof terms in the construction (t, M} 4, is designed to ensure the 
stability of types and expressions under —,,. 


7.2 Cut Elimination in PVS-Cert 


The following cut elimination theorem is a direct corollary of the strong normal- 
ization theorem and the type preservation of > go. 


Theorem 8 (Cut elimination). Whenever some PVS-Cert judgement of the 
form I F p : P is derivable for some proposition P and some proof p, p can 
be reduced using the reduction >g, to a normal form q such that the judgement 
I Hq: P is derivable. 
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Proof. By the strong normalization theorem, p can be reduced to a normal form 
q using the reduction —,,. By the type preservation theorem (Theorem 6), the 
judgement I` F q: P is derivable. 


We conclude this section showing how the cut elimination theorem can be used 
together with the properties of terms in normal form with respect to —g, as a tool 
to analyze some meta-theoretical properties of PVS-Cert. As presented at the end 
of this work, this approach will also allow to use cut elimination in PVS-Cert to 
analyze some meta-theoretical properties of PVS-Core. This use of cut elimination 
is illustrated with the following proof of consistency. 


Theorem 9. PVS-Cert is consistent: there exists no proof term p such that p : 
IIx : Prop.« is derivable. 


We use the following notion of elimination context in the proof: 


Definition 4 (Elimination contexts). We define the set of elimination contexts 
E with the grammar e := e | m(e) |e M. 
For any term N we define the instantiation e| N] by 


[NJ] =N m(e)[N] = m(elN]) (eM)[N] = (e[N])M 


Proof (Theorem 9). We suppose that there exists a proof p such that the judge- 
ment F p : Hx : Prop.x admits some derivation, and find a contradiction in the 
following way. Using the thinning property (mentioned in Sect. 4), x : PropF p: 
IIx : Prop.x is also derivable. Hence, applying the rule LAM followed by the rule 
APP, F Aw: Prop.(px) : Hx : Prop.a is derivable. 

By the cut elimination Theorem 8, Ax : Prop.(pa) admits a normal form Az : 
Prop.q with respect to —»g,, which is such that the judgement F Ax : Prop.g : 
IIx : Prop. is derivable. 

Considering the last rule different from CONVERSION used in such a deriva- 
tion (which is necessarily LAM), and using the stratification theorem, there exists 
a derivable judgement x : Prop F q : t for some expression t =g, x. Hence, using 
CONVERSION, x : Prop } q: x is also derivable. We consider D a possible deriva- 
tion of this judgement. 

As q is a proof and is in normal form with respect to —>gs, we conclude from 
a careful case analysis that q has one of the following forms: Av : T.M or e[v}. 
We discard the first possibility as follows. If q = Av : T.M, considering the last 
rule different from CONVERSION used in D (which is necessarily LAM), there exists 
some term of the form Jv’ : T’.U’ such that Hv’ : T’.U’ =g, x. By the Church- 
Rosser property (Theorem 1), this conversion cannot hold. As a consequence, q 
has the form e[v] for some elimination context e and some variable v. 

Considering the last rule different from CONVERSION, PROJ1, PROJ2, or APP 
used in D (which is necessarily VAR), some judgement of the form z : Prop F v : T 
is derivable, and v = x. As q is a proof, e[z] = q # x. Hence, D admits some 
subderivation of a judgement of the form x: Prop at’: T’ or x : Prop F m(x) : 
T’. Considering the last rule different from CONVERSION in such a derivation, and 
using the uniqueness of types (Theorem 2), this implies that there exists a term U 
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of the form Iv’ : T.T or {v’ : Ti | To} such that U =g, Prop. By the Church- 
Rosser property (Theorem 1), this conversion cannot hold. As a consequence, there 
exists no proof term p such that the judgement + p: Hx : Prop. is derivable. 


8 Type-Checking in PVS-Cert 


The purpose of this section is to present the main ideas leading to the definition of 
a type-checking algorithm for PVS-Cert. The decidability of type-checking is one 
of the most important results expected for PVS-Cert. In particular, it will be used 
in Sect. 10 together with the translation from PVS-Core derivations to PVS-Cert 
established in Sect. 9 to show that PVS-Cert judgements can be used as verifiable 
certificates for PVS-Core. 

This algorithm is mainly based on the type preservation Theorem 6 and the 
strong normalization Theorem 7 presented in the previous sections. In this section, 
we will only focus on the main specificities of the algorithm. Its precise definition, 
as well as the proofs of its soundness, termination, and completeness can be found 
in the author’s PhD dissertation [1]. 

The algorithm is comparable to the algorithm presented in [6] for the general 
case of injective PTSs (which applies to \-HOL). Besides the fact that our algo- 
rithm is extended to handle predicate subtypes, coercions (M, N)r and projections 
Ti(M), the main difference between the two is the use of both reductions —g, and 
—go in the case of PVS-Cert, while only — is used for injective PTSs. 

On the one hand, —g,-normalization is used to check =g.-conversion on well- 
typed terms: by the Church-Rosser property and strong normalization, two well- 
typed terms are =,,-equivalent if and only if they admit the same normal form, 
which is unique. As in [6], this decision procedure for conversion on well-typed 
terms is used in turn together with the uniqueness of types (Theorem 2) to define 
type-checking from type inference, which is itself defined recursively. 


Remark 3. In order to avoid redundant context well-formedness verifications in 
the multiple recursive calls of the type inference algorithm, we choose here to check 
the well-formedness of a context I’ beforehand when inferring a type for some term 
M in I. For this reason, type inference and type-checking are defined in two steps. 
First, we define auxiliary type inference and type-checking algorithms which are 
only ensured to operate soundly with well-formed contexts. Then, we use these 
auxiliary functions to define context well-formedness verification as well as com- 
plete type inference and type-checking algorithms, which operate soundly with any 
context. 


On the other hand, —g, is used in type inference to handle applications: 
TEM: Ilv:T,.T> TEN: T, 
CE MN: T3[N/v] 
In this situation, the recursive call on the first premise may produce a term 


U such that [ F M : U is derivable, but U is not ensured to have the form 
Iv : U.U — counterexamples can be easily found when M is a proof and U is 


APP 
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a proposition. The usual solution to this issue, used e.g. in [6], is to reduce U using 
the reduction underlying conversion (or more specifically its restriction to weak 
head reduction, which is more economic): indeed, using the uniqueness of types as 
well as strong normalization, type preservation, and the Church-Rosser property, 
it can be proved that aterm U” will be obtained, that M admits the type U’, and 
that U” has the form Hw : U,.U2 if M admits a type of this form. 

However, in the case of PVS-Cert, this approach cannot be followed directly, 
as the reduction underlying conversion, which is —,, is not type preserving: U” 
is not necessary a valid type for M. For this reason, we use instead the type pre- 
serving reduction gq (again, we use more specifically its restriction to weak head 
reduction, which is more economic). Using the strong normalization theorem, this 
operation terminates and yields some term U”. As a direct corollary of type preser- 
vation (based on Theorems 3 and 5), M admits the type U”. What is left is to prove 
that U” has the form [Tv : U.U if M admits a type of this form, which is done 
as follows. If M admits a type of the form Iv : T,.T), then U” =g, Hv : T-T 
by the uniqueness of types. Hence, analyzing the possible forms of the weak head 
normal form U” and using the Church-Rosser property, we conclude that U” has 
the form Hw : U,.U2, as expected. 


Compared to [6], new cases must be added for predicate subtypes, coercions 
(M, N)r, and projections 7; (1). These cases are handled in a similar way as in the 
case of PTSs with dependent pairs (see for instance ECC [16]), and don’t involve 
any specific difficulty. Instead, a more distinctive specificity of the algorithm lies 
in the case of \-abstraction: 


Tivu:TEFM:U rH Hv:T.U:s 
FrAv:T.M: Tv: T.U 


LAM 


As in the case of injective PTSs studied in [6], applying a recursive call on this 
second premise would be problematic. On the one hand, it would make the algo- 
rithm slower. On the other hand, it would break the simplicity of the proof of termi- 
nation, based on the fact that recursive calls of type inference are done on subterms 
exclusively. 

A general solution for this issue, applicable to any injective PTSs, is presented 
in [6] using some classification of terms to avoid this unwanted recursive call. The 
solution selected for PVS-Cert follows the same approach, adapted to the strati- 
fied terms of PVS-Cert. It relies on a classifying algorithm LEVEL(-), which ensures 
that whenever M is either an expression, a type, Type, or Kind, then LEVEL(M) 
is either 1, 2, 3, or 4 respectively. As it is specifically suited to PVS-Cert, this def- 
inition is simpler than the classification presented in [6], which is intended to be 
applicable to a wide family of type systems. The algorithm is defined as follows: 


Definition 5. We define the algorithm LEVEL(-) by recursion on its argument. 
The possible cases are the following. 

- LEVEL(Kind) = 4, LEVEL(Type) = 3, LEVEL(Prop) = 2 

- LEVEL(ITv : T.U) = LEVEL(U), LEVEL({v : T | U}) = 2, LEVEL(X) = 2 

- In all other cases, LEVEL(M) = 1 
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9 Expressing PVS-Core in PVS-Cert 


The final purpose of PVS-Cert is to encode PVS-Core derivations as PVS-Cert 
judgements, and to use the type-checking algorithm presented in Sect.8 to use 
these judgements as verifiable certificates. In this perspective, we define a corre- 
spondence between PVS-Core and PVS-Cert. This correspondence reflects the fact 
that, even though these two systems are very different at the level of terms and 
judgements, they are almost identical at the level of derivations. 


9.1 An Erasing Function from PVS-Cert to PVS-Core 


We begin the description of this correspondence with a translation from PVS-Cert 
to PVS-Core, referred to as erasing. This translation mainly consists in the erasure 
of PVS-Cert explicit coercions (-, M} 4 and 7(-). 


Definition 6. We define an erasure function |] from PVS-Cert expressions, 
types, and Type to PVS-Core terms recursively as follows. 


[Type] = Type [r] =x [t M) a] = E 
[Prop] = Prop [Av : A.t] = Ax : [A]. [t] [71(t)] = [e] 
[X]=x [¢ ul = [elle] 

(iia: A.B] = Ia: [A].[5] [Hz : A.P] = Va: [A].[P] 


Ke: A| PHs {2: [A| [PD Lh: P.Q] = [P] = [Q] 


Then, we extend straightforwardly |-] from PVS-Cert stratified contexts to P VS- 
Core contexts: for instance, |P, x : A, X : Type] = [P], x : [A], X : Type. 

Last, we extend straightforwardly |-] from all PVS-Cert stratified judgements 
except those of the form I’ + Type : Kind to PVS-Core judgements. For instance, 
lx: A,X : Type F p: P] = < : [A], X : Type |P]. The PVS-Cert judgements 
of the form I’ + Type : Kind are not translated. 


By the stratification theorem in PVS-Cert, all PVS-Cert derivable judgements 
are stratified judgements. Hence, unless they have the form I’ F Type : Kind, 
their erasure in PVS-Core is well-defined. We will prove in Theorem 10 that they 
are derivable in PVS-Core. This theorem relies in particular on the fact that con- 
version in PVS-Cert and PVS-Core are related through the erasure function [-], 
established in the following proposition. The corresponding proof does not involve 
any specific difficulty. 


Proposition 3. For all terms M and N which are either expressions, types, or 
Type, whenever M =g, N, then |M] =e [N]. 


Using the two previous propositions and the stratification theorem in PVS- 
Cert, we conclude the following theorem, which allows to map PVS-Cert deriva- 
tions to PVS-Core derivations. 
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Theorem 10. Every derivable PVS-Cert judgement either has the form I F 
Type : Kind or admits an image through [-]. In the latter case, this image is deriv- 
able in PVS-Core. 


Proof. The first part of the proof is a direct consequence of the stratification theo- 
rem. The second part is proved by induction on the height of PVS-Cert derivations. 
All cases are straightforward, using the stratification theorem when necessary to 
establish a correspondence between stratified versions of PVS-Cert rules and PVS- 
Core rules. For instance: 


— DECL corresponds either to TYPEDECL, ELTDECL, or ASSUMPTION 

— SORT corresponds to PROP only (judgements of the form I + Type : Kind are 
not translated) 

— PROD corresponds either to PI, FORALL, or IMPLY 


9.2 Expressing PVS-Core Derivations as PVS-Cert Judgements 


Theorem 10 shows that a PVS-Cert derivable judgement can testify to the PVS- 
Core derivability of another judgement: its erasure. In this section, we show con- 
versely that, given any PVS-Core derivation, we can build such a PVS-Cert judge- 
ment. For this purpose, we first present an algorithm CERTIFICATE, which trans- 
lates a PVS-Core derivation into a PVS-Cert judgement. In a second step, we will 
prove that such PVS-Cert judgements are always derivable in PVS-Cert. 


Definition 7. For any PVS-Core derivation D, we define recursively the PVS- 
Cert stratified judgement CERTIFICATE(D) such that [CERTIFICATE(D)] corre- 
sponds to the conclusion of D. 

In this definition, we use an injective function h(-) mapping natural numbers to 
PVS-Cert proof variables, which can be chosen arbitrarily. We present two cases: 
ASSUMPTION, which shows how h(-) is used, and IMPLYELIM. This latter case (as 
well as FORALLELIM) is more complex than others as it involves the computation 
of a normal form with respect to >x, i.e. the erasure of coercions at the head of a 
term. The other cases are detailed in the author’s PhD dissertation [1]. 


I H P: Prop 


= “PP PrWr ASSUMPTION 


We consider Dı the derivation of I | P : Prop. CERTIFICATE(D,) has the 
form I, F Pı : Prop. We consider n the number of declarations of the form 
(h: Q) in Tı, and we define CERTIFICATE(D) = I), h(n): P,- WF. 
TFPSQ PEP 
IFO 


We consider Dı and Də the respective derivations of lT F P > Q andr F P. 
CERTIFICATE( D2) has the form I> F pz : Pz and CERTIFICATE(D1) has the 
form I, F pı : Q1. As [Qi] = (P = Q), its normal form with respect to >, 
has the form IIh : P1.Qı. We define CERTIFICATE(D) = T; F pipo : Qıfp2/h]. 
As all proof terms are deleted through the erasure function, [Q1|p2/h]] = [Q1]. 
On the other hand, by induction hypothesis, [Qi] = Q, hence the erasure of this 
judgement is + Q, as expected. 


IMPLYELIM 
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9.3 Relating Conversion in PVS-Core and PVS-Cert 


In order to prove that the outputs of the algorithm CERTIFICATE are derivable in 
PVS-Cert (presented in Theorem 11), the main required lemma is the fact that is 
the converse of Proposition 3: for any terms M and N which are either expressions, 
types, or Type and which verify [M] =, [N], then M =g N. More precisely, this 
property will be used in the proof of Theorem 11 to handle the cases of conversion 
rules TYPECONVERSION and PROPCONVERSION. 

We first establish a modified version of this expected result, using equality and 
=, instead of =g and =z, respectively. The proof is straightforward by induction 
on the two involved terms. 


Proposition 4. For all terms M and N which are either expressions, types, or 
Type, whenever [M] = [N], then M = N. 


Then, we establish the expected converse of Proposition 3 as follows. 


Proposition 5. For all terms M and N which are either expressions, types, or 
Type, whenever [M] =g [N], then M =g, N. 


Proof. We present a proof based on the definition of a simple translation of PVS- 
Core terms as PVS-Cert expressions, types, or Type, which does not introduce any 
explicit coercion: for instance, 


— [Hz : A.B] = Ia: [A].|B] 
- [P => Q] = IIh : [P].[Q] for an arbitrary proof variable h 


We first show straightforwardly that the respective images through [-] of two 
terms related by =, are also related by =g. As a consequence, [[M]] =z [[N]I- 

On the other hand, it is straightforward to show that [-] is a right inverse of the 
erasure function [-]. Hence, [[[M]]] = [M]. By Proposition 4, we conclude that 
[[M]] =. M. Following the same reasoning, [[N]] =. N. 

As a consequence, M =p, [[M]] =e. [LN] =o. N. 


9.4 Soundness of the Synthesis of Certificates 


The last proposition needed to prove the soundness of the algorithm CERTIFICATE 
is the following. It shows that the operation of normalization through >, (which 
erases the coercions 71(-) and (-, M}r at the head of a term) is safely used in the 
definition of CERTIFICATE. 


Proposition 6. For any derivable PVS-Cert judgement of the form I F t : 
{z£n... {x1 : Prop | Qi}... | Qn}, ift admits a normal form with respect to >, which 
has the form Iv : M.T, then T + Hw : M.T : Prop is derivable. 
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In fact, only the specific case n = 0 is used in the proof of soundness of 
CERTIFICATE, but this generalization is preferred as it admits a direct proof by 
induction on t, which does not involve any specific difficulty. 

Last, we present the expected soundness property for CERTIFICATE: 


Theorem 11. For any PVS-Core derivation D, CERTIFICATE(D) is derivable in 
PVS-Cert. 


Proof. The proof is done by induction on D. Most cases are proved without any 
specific difficulty. In particular, the cases of conversion rules TyYPECONVERSION 
and PROPCONVERSION are straightforward using Proposition 5. 

The most complex cases correspond to the rules IMPLYELIM and FORALLELIM 
which involve, by definition of CERTIFICATE, some normalization with respect to 
>,. In such cases, Proposition 6 is used to handle the specific difficulties related to 
this normalization. We present the case IMPLYELIM: 


r-P>Q TEP 
TFQ 


IMPLYELIM 


We consider Dı and Də the respective derivations of l F P > Qand T F P. 
CERTIFICATE( D2) has the form I> F po : Pz and CERTIFICATE(D,) has the form 
Iı F pı : Q1. As [Qi] = (P => Q), its normal form with respect to >, has the 
form ITh : P,.Q,. In this setting, CERTIFICATE(D) = I; F pipo : Qılp2/h]. 
By induction hypothesis, T} F pı : Q4 and I) F po : P are derivable in PVS- 
Cert. By Proposition 3 and the stratification theorem, I + Q; : Propis derivable 
in PVS-Cert. Hence, by Proposition 6, J} F Hh : P,.Q, : Prop is derivable as 
well. As Qi =p, Ih : P,.Q1, we conclude applying the CONVERSION rule that 
I; F py: Hh : Py.Q, is derivable. 

On the other hand, using Proposition 4, we can conclude from [I] = F = [T2] 
that Ti =, I> as long as both contexts admit the list of declared proof variables, 
in the same order. This is the case as, by straightforward induction on PVS-Core 
derivations, this list is h(1), A(2), ..., h(n), where h(-) is the injective function used 
in the definition of CERTIFICATE and n is the number of proof variable declarations 
in Ii and I. Hence, Ii =ý I>. 

As T; F pı : Hh : P,.Q) is derivable, by Theorem 3 and the stratification the- 
orem, I F Hh : P,.Q, : Prop is derivable. Hence, considering the last rule differ- 
ent from CONVERSION used in such a derivation (which is necessarily PROD), and 
using the stratification theorem, I, F P, : Prop is derivable as well. As a conse- 
quence, using context conversion (mentioned in Sect. 4), I, F po : P, is derivable 
in PVS-Cert. Hence, applying the rule APP, I F pipe : Qi[p2/h] is derivable, as 
expected. 
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10 Using PVS-Cert as a System of Verifiable Certificates 
for PVS-Core 


This final section shows how to use the different results presented in this paper to 
answer to the main question addressed in the current work: defining a system of 
verifiable certificates for PVS-Core. 

A PVS-Cert judgement l F p : P can be used as a certificate for its PVS- 
Core erasure |7] + [P] (Definition 6), which is verifiable using the type-checking 
algorithm presented in Sect. 8. On the one hand, this approach is sound: whenever 
the type-checking algorithm succeeds, I’ F p : P is derivable in PVS-Cert, hence 
[I] + [P] is derivable in PVS-Core by Theorem 10. 

On the other hand, valid certificates can be generated for arbitrary PVS-Core 
theorems in the following way. Given some PVS-Core judgement A F Q deriv- 
able through some derivation D, the PVS-Cert judgement CERTIFICATE(D) can 
be used as a certificate of A F Q. Indeed, using the notations [ + p : P for 
CERTIFICATE(D), the following statements hold. 


— By definition of CERTIFICATE, |I] = A and [P] = Q, hence this judgement is 
a certificate for A F Q. 

— By Theorem 11, I F p : P is derivable, hence the execution of the type-checking 
algorithm on this judgement succeeds: this certificate is valid. 


These PVS-Cert certificates represent PVS-Core derivations in a very com- 
pact way. As each of the different constructions of types, expressions, and proofs in 
PVS-Cert corresponds to some PVS-Core derivation rule, the size of a PVS-Cert 
certificate is comparable, as a rough estimation, with the size of a corresponding 
PVS-Core derivation in which all PVS-Core judgements are deleted. 

We finally show that, through the construction of certificates, the PVS-Cert 
cut elimination theorem can be used to study meta-theoretical properties of PVS- 
Core. This possible use is illustrated with the case of consistency, proved in PVS- 
Cert in Theorem 9 using cut elimination. 


Theorem 12. The system PVS-Core is consistent: the judgement Va : Prop.x 
is not derivable. 


Proof. If the judgement + Vx : Prop.x admits a PVS-Core derivation D, we con- 
sider + p : P = CERTIFICATE(D). By definition, |P] = Vx : Prop.c = [IIa : 
Prop.x]. Hence, by Proposition 5, P =g. Hx : Prop.x. Ast IIx: Prop.x : Prop 
is derivable in PVS-Cert, we can apply the conversion rule to conclude that F p : 
IIx : Prop. is derivable in PVS-Cert, which is impossible by Theorem 9. 
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Abstract. Secure compilers generate compiled code that withstands 
many target-level attacks such as alteration of control flow, data leaks 
or memory corruption. Many existing secure compilers are proven to 
be fully abstract, meaning that they reflect and preserve observational 
equivalence. Fully abstract compilation is strong and useful but, in cer- 
tain cases, comes at the cost of requiring expensive runtime constructs in 
compiled code. These constructs may have no relevance for security, but 
are needed to accommodate differences between the source and target 
languages that fully abstract compilation necessarily needs. 

As an alternative to fully abstract compilation, this paper explores a 
different criterion for secure compilation called robustly safe compilation 
or RSC. Briefly, this criterion means that the compiled code preserves 
relevant safety properties of the source program against all adversarial 
contexts interacting with the compiled program. We show that RSC can 
be proved more easily than fully abstract compilation and also often 
results in more efficient code. We also develop two illustrative robustly- 
safe compilers and, through them, illustrate two different proof tech- 
niques for establishing that a compiler attains RSC. Based on these, we 
argue that proving RSC can be simpler than proving fully abstraction. 


To better explain and clarify notions, this paper uses colours. For a 
better experience, please print or view this paper in colours.! 


1 Introduction 


Low-level adversaries, such as those written in C or assembly can attack co- 
linked code written in a high-level language in ways that may not be feasible in 
the high-level language itself. For example, such an adversary may manipulate 
or hijack control flow, cause buffer overflows, or directly access private memory, 


1 Specifically, in this paper we use a blue, sans-serif font for source elements, an 
orange, bold font for target elements and a black, italic font for elements com- 
mon to both languages (to avoid repeating similar definitions twice). Thus, C is a 
source-level component, C is a target-level component and C is generic notation for 
either a source-level or a target-level component. 
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all in contravention to the abstractions of the high-level language. Specific coun- 
termeasures such as Control Flow Integrity [3] or Code Pointer Integrity [41] 
have been devised to address some of these attacks individually. An alterna- 
tive approach is to devise a secure compiler, which seeks to defend against 
entire classes of such attacks. Secure compilers often achieve security by relying 
on different protection mechanisms, e.g., cryptographic primitives [4,5,22,26], 
types [10,11], address space layout randomisation [6,37], protected module archi- 
tectures [9,53,57,59] (also know as enclaves [46]), tagged architectures [7,39], etc. 
Once designed, the question researchers face is how to formalise that such a com- 
piler is indeed secure, and how to prove this. Basically, we want a criterion that 
specifies secure compilation. A widely-used criterion for compiler security is fully 
abstract compilation (FAC) [2,35,52], which has been shown to preserve many 
interesting security properties like confidentiality, integrity, invariant definitions, 
well-bracketed control flow and hiding of local state [9,37,53, 54]. 

Informally, a compiler is fully abstract if it preserves and reflects observa- 
tional equivalence of source-level components (i.e., partial programs) in their 
compiled counterparts. Most existing work instantiates observational equivalence 
with contextual equivalence: co-divergence of two components in any larger con- 
text they interact with. Fully abstract compilation is a very strong property, 
which preserves all source-level abstractions. 

Unfortunately, preserving all source-level abstractions also has downsides. In 
fact, while FAC preserves many relevant security properties, it also preserves a 
plethora of other non-security ones, and the latter may force inefficient checks in 
the compiled code. For example, when the target is assembly, two observationally 
equivalent components must compile to code of the same size [9,53], else full 
abstraction is trivially violated. This requirement is security-irrelevant in most 
cases. Additionally, FAC is not well-suited for source languages with undefined 
behaviour (e.g., C and LLVM) [39] and, if used naively, it can fail to preserve even 
simple safety properties [60] (though, fortunately, no existing work falls prey to 
this naivety). 

Motivated by this, recent work started investigating alternative secure com- 
pilation criteria that overcome these limitations. These security-focussed criteria 
take the form of preservation of hyperproperties or classes of hyperproperties, 
such as hypersafety properties or safety properties [8,33]. This paper investigates 
one of these criteria, namely, Robustly Safe Compilation (RSC) which has clear 
security guarantees and can often be attained more efficiently than FAC. 

Informally, a compiler attains RSC if it is correct and it preserves robust 
safety of source components in the target components it produces. Robust safety 
is an important security notion that has been widely adopted to formalize secu- 
rity, e.g., of communication protocols [14,17,34]. Before explaining RSC, we 
explain robust safety as a language property. 


Robust Safety as a Language Property. Informally, a program property is a safety 
property if it encodes that “bad” sequences of events do not happen when the 
program executes [13,63]. A program is robustly safe if it has relevant (specified) 
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safety properties despite active attacks from adversaries. As the name suggests, 
robust safety relies on the notions of safety and robustness which we now explain. 


Safety. As mentioned, safety asserts that “no bad sequence of events happens”, 
so we can specify a safety property by the set of finite observations which char- 
acterise all bad sequences of events. A whole program has a safety property if 
its behaviours exclude these bad observations. Many security properties can be 
encoded as safety, including integrity, weak secrecy and functional correctness. 


Example 1 (Integrity). Integrity ensures that an attacker does not tamper with 
code invariants on state. For example, consider the function charge_account(n) 
which deducts amount n from an account as part of an electronic card payment. A 
card PIN is required if n is larger than 10 euros. So the function checks whether n 
> 10, requests the PIN if this is the case, and then changes the account balance. 
We expect this function to have a safety (integrity) property in the account 
balance: A reduction of more than 10 euros in the account balance must be 
preceded by a call to request_pin(). Here, the relevant observation is a trace 
(sequence) of account balances and calls to request_pin(). Bad observations for 
this safety property are those where an account balance is at least 10 euros less 
than the previous one, without a call to request_pin() in between. Note that 
this function seems to have this safety property, but it may not have the safety 
property robustly: a target-level adversary may transfer control directly to the 
“else” branch of the check n > 10 after setting n to more than 10, to violate the 
safety property. 


Example 2 (Weak Secrecy). Weak secrecy asserts that a program secret never 
flows explicitly to the attacker. For example, consider code that manages 
network_h, a handler (socket descriptor) for a sensitive network interface. This 
code does not expose network_h directly to external code but it provides an 
API to use it. This API makes some security checks internally. If the handler 
is directly accessible to outer code, then it can be misused in insecure ways 
(since the security checks may not be made). If the code has weak secrecy wrt 
network_h then we know that the handler is never passed to an attacker. In 
this case we can define bad observations as those where network_h is passed to 
external code (e.g., as a parameter, as a return value on or on the heap). 


Example 8 (Correctness). Program correctness can also be formalized as a safety 
property. Consider a program that computes the nth Fibonacci number. The 
program reads n from an input source and writes its output to an output source. 
Correctness of this program is a safety property. Our observations are pairs of an 
input (read by the program) and the corresponding output. A bad observation 
is one where the input is n (for some n) but the output is different from the nth 
Fibonacci number. 


These examples not only illustrate the expressiveness of safety properties, but 
also show that safety properties are quite coarse-grained: they are only concerned 
with (sequences of) relevant events like calls to specific functions, changes to 


472 M. Patrignani and D. Garg 


specific heap variables, inputs, and outputs. They do not specify or constrain how 
the program computes between these events, leaving the programmer and the 
compiler considerable flexibility in optimizations. However, safety properties are 
not a panacea for security, and there are security properties that are not safety. 
For example, noninterference [70,72], the standard information flow property, 
is not safety. Nonetheless, many interesting security properties are safety. In 
fact, many non-safety properties including noninterference can be conservatively 
approximated as safety properties [20]. Hence, safety properties are a meaningful 
goal to pursue for secure compilation. 


Robustness. We often want to reason about properties of a component of inter- 
est that hold irrespective of any other components the component interacts with. 
These other components may be the libraries the component is linked against, 
or the language runtime. Often, these surrounding components are modelled as 
the program context whose hole the component of interest fills. From a security 
perspective the context represents the attacker in the threat model. When the 
component of interest links to a context, we have a whole program that can run. 
A property holds robustly for a component if it holds in any context that the 
component of interest can be linked to. 


Robust Safety Preservation as a Compiler Property. A compiler attains robustly 
safe compilation or RSC if it maps any source component that has a safety 
property robustly to a compiled component that has the same safety property 
robustly. Thus, safety has to hold robustly in the target language, which often 
does not have the powerful abstractions (e.g., typing) that the source language 
has. Hence, the compiler must insert enough defensive runtime checks into the 
compiled code to prevent the more powerful target contexts from launching 
attacks (violations of safety properties) that source contexts could not launch. 
This is unlike correct compilation, which either considers only those target con- 
texts that behave like source contexts [40,49,65] or considers only whole pro- 
grams [43]. 

As mentioned, safety properties are usually quite coarse-grained. This means 
that RSC still allows the compiler to optimise code internally, as long as the 
sequence of observable events is not affected. For example, when compiling the 
fibonacci function of Example 3, the compiler can do any internal optimisation 
such as caching intermediate results, as long as the end result is correct. Cru- 
cially, however, these intermediate results must be protected from tampering by 
a (target-level) attacker, else the output can be incorrect, breaking RSC. 

A RSC-attaining compiler focuses only on preserving security (as captured 
by robust safety) instead of contextual equivalence (typically captured by full 
abstraction). So, such a compiler can produce code that is more efficient than 
code compiled with a fully abstract compiler as it does not have to preserve all 
source abstractions (we illustrate this later). 

Finally, robust safety scales naturally to thread-based concurrency [1, 34,58]. 
Thus RSC also scales naturally to thread-based concurrency (we demonstrate 
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this too). This is unlike FAC, where thread-based concurrency can introduce 
additional undesired abstractions that also need to be preserved. 

RSC is a very recently proposed criterion for secure compilers. Recent 
work [8,33] define RSC abstractly in terms of preservation of program 
behaviours, but their development is limited to the definition only. Our goal 
in this paper is to examine how RSC can be realized and established, and to 
show that in certain cases it leads to compiled code that is more efficient than 
what FAC leads to. To this end, we consider a specific setting where observa- 
tions are values in specific (sensitive) heap locations at cross-component calls. 
We define robust safety and RSC for this specific setting (Sect. 2). Unlike pre- 
vious work [8,33] which assumed that the domain of traces (behaviours) is the 
same in the source and target languages, our RSC definition allows for different 
trace domains in the source and target languages, as long as they can be suit- 
ably related. The second contribution of our paper is two proof techniques to 
establish RSC. 


— The first technique is an adaption of trace-based backtranslation, an existing 
technique for proving FAC [7,9,59]. To illustrate this technique, we build a 
compiler from an untyped source language to an untyped target language with 
support for fine-grained memory protection via so-called capabilities [23,71] 
(Sect.3). Here, we guarantee that if a source program is robustly safe, then 
so is its compilation. 

— The second proof technique shows that if source programs are verified for 
robust safety, then one can simplify the proof of RSC so that no backtrans- 
lation is needed. In this case, we develop a compiler from a typed source 
language where the types already enforce robust safety, to a target language 
similar to that of the first compiler (Sect. 4). In this instance, both languages 
also support shared-memory concurrency. Here, we guarantee that all com- 
piled target programs are robustly safe. 


To argue that RSC is general and is not limited to compilation targets based 
on capabilities, we also develop a third compiler. This compiler starts from the 
same source language as our second compiler but targets an untyped concurrent 
language with support for coarse-grained memory isolation, modelling recent 
hardware extensions such as Intel’s SGX [46]. Due to space constraints, we report 
this result only in the companion technical report [61]. 

The final contribution of this paper is a comparison between RSC and FAC. 
For this, we describe changes that would be needed to attain FAC for the first 
compiler and argue that these changes make generated code inefficient and also 
complicate the backtranslation proof significantly (Sect. 5). 

Due to space constraints, we elide some technical details and limit proofs to 
sketches. These are fully resolved in the companion technical report [61]. 
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2 Robustly Safe Compilation 


This section first discusses robust safety as a language (not a compiler) property 
(Sect. 2.1) and then presents RSC as a compiler property along with an informal 
discussion of techniques to prove it (Sect. 2.2). 


2.1 Safety and Robust Safety 


To explain robust safety, we first describe a general imperative programming 
model that we use. Programmers write components on which they want to 
enforce safety properties robustly. A component is a list of function definitions 
that can be linked with other components (the context) in order to have a 
runnable whole program (functions in “other” components are like extern func- 
tions in C). Additionally, every component declares a set of “sensitive” locations 
that contain all the data that is safety-relevant. For instance, in Example 1 this 
set may contain the account balance and in Example 3 it may contain the I/O 
buffers. We explain the relevance of this set after we define safety properties. 

We want safety properties to specify that a component never executes a “bad” 
sequence of events. For this, we first need to fix a notion of events. We have 
several choices here, e.g., our events could be inputs and outputs, all syscalls, 
all changes to the heap (as in CompCert [44]), etc. Here, we make a specific 
choice motivated by our interest in robustness: We define events as calls/re- 
turns that cross a component boundary, together with the state of the heap 
at that point. Consequently, our safety properties can constrain the contents of 
the heap at component boundaries. This choice of component boundaries as the 
point of observation is meaningful because, in our programming model, control 
transfers to/from an adversary happen only at component boundaries (more pre- 
cisely, they happen at cross-component function call and returns). This allows 
the compiler complete flexibility in optimizing code within a component, while 
not reducing the ability of safety properties to constrain observations of the 
adversary. 

Concretely, a component behaviour is a trace, i.e., a sequence of actions 
recording component boundary interactions and, in particular, the heap at these 
points. Actions, the items on a trace, have the following grammar: 


Actions a ::= call f v H? | call f v H!|ret H!|ret H? 


These actions respectively capture call and callback to a function f with param- 
eter v when the heap is H as well as return and returnback with a certain 
heap H.? We use ? and ! decorations to indicate whether the control flow of the 
action goes from the context to the component (?) or from the component to the 
context (!). Well-formed traces have alternations of ? and ! decorated actions, 


? A callback is a call from the component to the context, so it generates label 
call f v H!. A returnback is a return from such a callback, i.e., the context returning 
to the component, and it generates the label ret H?. 
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starting with ? since execution starts in the context. For a sequence of actions 
a, relevant(@) is the list of heaps H mentioned in the actions of @. 

Next, we need a representation of safety properties. Generally, properties are 
sets of traces, but safety properties specifically can be specified as automata (or 
monitors in the sequel) [63]. We choose this representation since monitors are 
less abstract than sets of traces and they are closer to enforcement mechanisms 
used for safety properties, e.g., runtime monitors. Briefly, a safety property is a 
monitor that transitions states in response to events of the program trace. At 
any point, the monitor may refuse to transition (it gets stuck), which encodes 
property violation. While a monitor can transition, the property has not been 
violated. Schneider [63] argues that all properties codable this way are safety 
properties and that all enforceable safety properties can be coded this way. 

Formally, a monitor M in our setting consists of a set of abstract states 
{o---}, the transition relation ~, an initial state og, the set of heap locations 
that matter for the monitor, {/---}, and the current state oe (we indicate a set of 
elements of class e as {e--- }). The transition relation ~ is a set of triples of the 
form (os, H, of) consisting of a starting state gs, a final state of and a heap H. 
The transition (os, H, øp) is interpreted as “state os transitions to of when the 
heap is H”. When determining the monitor transition in response to a program 
action, we restrict the program’s heap to the location set {1---}, i.e., to the set 
of locations the monitor cares about. This heap restriction is written H | {be} 


We assume determinism of the transition relation: for any o, and (restricted 
heap) H, there is at most one oy such that (os, H, of) € ~>. 

Given the behaviour of a program as a trace @ and a monitor M specifying 
a safety property, M F œ denotes that the trace satisfies the safety property. 
Intuitively, to satisfy a safety property, the sequence of heaps in the actions of 
a trace must never get the monitor stuck (Rule Valid trace). Every single heap 
must allow the monitor to step according to its transition relation (Rule Monitor 
Step). Note that we overload the ~~ notation here to also denote an auxiliary 
relation, the monitor small-step semantics (Rule Monitor Step-base and Rule 
Monitor Step-ind). 


(Valid trace) (Monitor Step-base) = (Monitor Step-ind) 
M;relevant(a@) ~ M’ M;H ~ M” M";H ~ M' 
MPa Meet M;H- H ~ M' 


(Monitor Step) 
(Tc, Aly, p oj) E~ 


({o---}, 00, {l}, 0c); H ~ ({a---}$,~,00,{l---}, oF) 


With this setup in place, we can formalise safety, attackers and robust safety. 
In defining (robust) safety for a component, we only admit monitors (safety 
properties) whose {/---} agrees with the sensitive locations declared by the 
component. Making the set of safety-relevant locations explicit in the compo- 
nent and the monitor gives the compiler more flexibility by telling it precisely 
which locations need to be protected against target-level attacks (the compiler 
may choose to not protect the rest). At the same time, it allows for expressive 
modelling. For instance, in Example 3 the safety-relevant locations could be the 
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I/O buffers from which the program performs inputs and outputs, and the safety 
property can constrain the input and output buffers at corresponding call and 
return actions involving the Fibonacci function. 


Definition 1 (Safety, attacker and robust safety). 


M+ C:safe= if | C: whole then if Q9(C) => _ then MHa 


def 


CtA:atk= C={l---},F and {l---}Nf£n(A)=@2 


def 


MF C:rs=VA. if MTC and Ct A: atk then M+ A[C]: safe 


A whole program C is safe for a monitor M, written M F C : safe, if the monitor 
accepts any trace the program generates from its initial state (2o (C)). 

An attacker A is valid for a component C, written C H A: atk, if A’s free 
names (denoted fn(A)) do not refer to the locations that the component cares 
about. This is a basic sanity check: if we allow an attacker to mention heap 
locations that the component cares about, the attacker will be able to modify 
those locations, causing all but trivial safety properties to not hold robustly. 

A component C is robustly safe wrt monitor M, written M+ C : rs, if 
C composed with any attacker is safe wrt M. As mentioned, for this setup to 
make sense, the monitor and the component must agree on the locations that 
are safety-relevant. This agreement is denoted M T C. 


2.2 Robustly Safe Compilation 


Robustly-safe compilation ensures that robust safety properties and their mean- 
ings are preserved across compilation. But what does it means to preserve mean- 
ings across languages? If a source safety property says never write 3 to a location, 
and we compile to an assembly language by mapping numbers to binary, the 
corresponding target property should say never write 0x11 to an address. 

In order to relate properties across languages, we assume a relation © : 
v x v between source and target values that is total, so it maps any source 
value v to a target value v : Vv.dv.v=v. This value relation is used to define 
a relation between heaps: H ~ H, which intuitively holds when related locations 
point to related values. This is then used to define a relation between actions: 
asa, which holds when the two actions are the “same” modulo this relation, 
i.e., call - - -? only relates to call + - -? and the arguments of the action 
(values and heap) are related. Next, we require a relation M ~ M between source 
and target monitors, which means that the source monitor M and the target 
monitor M code the same safety property, modulo the relation ~% on values 
assumed above. The precise definition of this relation depends on the source and 
target languages; specific instances are shown in Sects. 3.3 and 4.3.3 


3 Accounting for the difference in the representation of safety properties sets us apart 
from recent work [8,33], which assumes that the source and target languages have 
the same trace alphabet. The latter works only in some settings. 
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We denote a compiler from language S to language T by J>. A compiler 


DH attains RSC, if it maps any component C that is robustly safe wrt M to a 
component C that is robustly safe wrt M, provided that M ~ M. 


Definition 2 (Robustly Safe Compilation). 


H [IS : RSC Ž YC, M, M. if MFC:rs and M&M then MF [C]À : rs 
A consequence of the universal quantification over monitors here is that the 
compiler cannot be property-sensitive. A robustly-safe compiler preserves all 
robust safety properties, not just a specific one, e.g., it does not just enforce 
that fibonacci is correct. This seemingly strong goal is sensible as compiler 
writers will likely not know what safety properties individual programmers will 
want to preserve. 


Remark. Some readers may wonder why we do not follow existing work and 
specify safety as “programmer-written assertions never fail” [31,34, 45,68]. Unfor- 
tunately, this approach does not yield a meaningful criterion for specifying a 
compiler, since assertions in the compiled program (if any) are generated by the 
compiler itself. Thus a compiler could just erase all assertions and the compiled 
code it generates would be trivially (robustly) safe — no assertion can fail if there 
are no assertions in the first place! 


Proving RSC. Proving that a compiler attains RSC can be done either by 
proving that a compiler satisfies Definition 2 or by proving something equivalent. 
To this end, Definition 3 below presents an alternative, equivalent formulation of 
RSC. We call this characterisation property-free as it does not mention monitors 
explicitly (it mentions the relevant( - ) function for reasons we explain below). 


Definition 3 (Property-Free RSC). 


def 


- [J] : PF-RSC = YC, A,a. 
iF CTH A sath and + A [ICI] : whole and % (4 [[CI]) > _ 


then JA,a. CH A: atk and  A[C]: whole and Qo (AJC) => _ 


and relevant(@) ~ relevant (q) 


Specifically, PF-RSC states that the compiled code produces behaviours that 
refine source level behaviours robustly (taking contexts into account). 
PF-RSC and RSC should, in general, be equivalent (Proposition 1). 


Proposition 1 (PF-RSC and RSC are equivalent). 
VERE LS: PF-RSC <>F+ [I : RSC 


Informally, a property is safety if and only if it implies programs not having any 
trace prefix from a given set of bad prefixes (i.e., finite traces). Hence, not having 
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a safety property robustly amounts to some context being able to induce a bad 
prefix. Consequently, preserving all robust safety properties (RSC) amounts to 
ensuring that all target prefixes can be generated (by some context) in the source 
too (PF-RSC). Formally, since Definition 2 relies on the monitor relation, we 
can prove Proposition 1 only after such a relation is finalised. We give such a 
monitor relation and proof in Sect. 3.3 (see Theorem 3). However, in general this 
result should hold for any cross-language monitor relation that correctly relates 
safety properties. If the proposition does not hold, then the relation does not 
capture how safety in one language is represented in the other. 

Assuming Proposition 1, we can prove PF-RSC for a compiler in place of 
RSC. PF-RSC can be proved with a backtranslation technique. This technique 
has been often used to prove full abstraction [7—9, 33,39, 50,53, 54,59] and it aims 
at building a source context starting from a target one. In fact PF-RSC, leads 
directly to a backtranslation-based proof technique since it can be rewritten 
(eliding irrelevant details) as: 


IFA, 7.2% (A [tcl |) = 


then JA,@.Q (A[C]) => _ and relevant(q@) ~ relevant(q) 


Essentially, given a target context A, a compiled program [ce and a target 
trace @ that A causes [cl to have, we need to construct, or backtranslate to, 
a source context A that will cause the source program C to simulate a. Such 
backtranslation based proofs can be quite difficult, depending on the features of 
the languages and the compiler. However, backtranslation for RSC (as we show 
in Sect. 3.3) is not as complex as backtranslation for FAC (Sect. 5.2). 

A simpler proof strategy is also viable for RSC’ when we compile only those 
source programs that have been verified to be robustly safe (e.g., using a type 
system). The idea is this: from the verification of the source program, we can find 
an invariant which is always maintained by the target code, and which, in turn, 
implies the robust safety of the target code. For example, if the safety property 
is that values in the heap always have their expected types, then the invariant 
can simply be that values in the target heap are always related to the source 
ones (which have their expected types). This is tantamount to proving type 
preservation in the target in the presence of an active adversary. This is harder 
than standard type preservation (because of the active adversary) but is still 
much easier than backtranslation as there is no need to map target constructs 
to source contexts syntactically. We illustrate this proof technique in Sect. 4. 


RSC Implies Compiler Correctness. As stated in Sect.1, RSC implies (a 
form of) compiler correctness. While this may not be apparent from Definition 2, 
it is more apparent from its equivalent characterization in Definition 3. We elab- 
orate this here. 

Whether concerned with whole programs or partial programs, compiler cor- 
rectness states that the behaviour of compiled programs refines the behaviour 
of source programs [18,36,40, 44,49, 65]. So, if {@--- } and {a--- } are the sets of 
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compiled and source behaviours, then a compiler should force {a --- } G {a@---}, 
where G is the composition of C and of the relation =~ +. 

If we consider a source component C that is whole, then it can only link 
against empty contexts, both in the source and in the target. Hence, in this 
special case, PF-RSC simplifies to standard refinement of traces, i.e., whole 
program compiler correctness. Hence, assuming that the correctness criterion for 
a compiler is concerned with the same observations as safety properties (values in 
safety-relevant heap locations at component crossings in our illustrative setting), 
PF-RSC implies whole program compiler correctness. 

However, PF-RSC (or, equivalently, RSC) does not imply, nor is implied by, 
any form of compositional compiler correctness (CCC) [40,49,65]. CCC requires 
that the behaviours produced by a compiled component linked against a target 
context that is related (in behaviour) to a source context can also be produced 
by the source component linked against the related source context. In contrast, 
PF-RSC allows picking any source context to simulate the behaviours. Hence, 
PF-RSC does not imply CCC. On the other hand, PF-RSC universally quan- 
tifies over all target contexts, while CCC only quantifies over target contexts 
related to a source context, so CCC does not imply PF-RSC either. Hence, 
compositional compiler correctness, if desirable, must be imposed in addition to 
PF-RSC. Note that this lack of implications is unsurprising: PF-RSC and CCC 
capture two very different aspects of compilation: security (against all contexts) 
and compositional preservation of behaviour (against well-behaved contexts). 


3 RSC via Trace-Based Backtranslation 


This section illustrates how to prove that a compiler attains RSC by means of a 
trace-based backtranslation technique [7,53,59]. To present such a proof, we first 
introduce our source language LY, an untyped, first-order imperative language 
with abstract references and hidden local state (Sect.3.1). Then, we present 
our target language L”, an untyped imperative target language with a concrete 
heap, whose locations are natural numbers that the context can compute. LP 
provides hidden local state via a fine-grained capability mechanism on heap 


accesses (Sect.3.2). Finally, we present the compiler [ke and prove that it 
attains RSC’ (Sect.3.3) by means of a trace-based backtranslation. The section 
conclude with an example detailing why RSC preserves security (Example 4). 
To avoid focussing on mundane details, we deliberately use source and tar- 
get languages that are fairly similar. However, they differ substantially in one 
key point: the heap model. This affords the target-level adversary attacks like 
guessing private locations and writing to them that do not obviously exist in the 
source (and makes our proofs nontrivial). We believe that (with due effort) the 
ideas here will generalize to languages with larger gaps and more features. 


3.1 The Source Language LU 


LY is an untyped imperative while language [51]. Components C are triples 
of function definitions, interfaces and a special location written froot, so C ::= 
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root; F; l. Each function definition maps a function name and a formal argument 
to a body s: F ::= f(x) + s;return;. An interface is a list of functions that the 
component relies on the context to provide (similar to C’s extern declarations). 
The special location ¢,..¢ defines the locations that are monitored for safety, as 
explained below. Attackers A (program contexts) are function definitions that 
represent untrusted code that a component interacts with. A function’s body is a 
statement, s. Statements are rather standard, so we omit a formal syntax. Briefly, 
they can manipulate the heap (location creation let x = new e in s, assignment 
x := e), do recursive function calls (call f e), condition (if-then-else), define local 
variables (let-in) and loop. Statements use effect-free expressions, e, which con- 
tain standard boolean expressions (e @ e), arithmetic expressions (e © e), pairing 
((e,e)) and projections, and location dereference (!e). Heaps H are maps from 
abstract locations / to values v. 

As explained in Sect.2.1, safety properties are specified by monitors. LU’s 
monitors have the form: M := ({a7---},~+,00, Loot, oc). Note that in place of 
the set {/---} of safety-relevant locations, the description of a monitor here (as 
well as a component above) contains a single location ¢,..¢. The interpretation is 
that any location reachable in the heap starting from 4,oot is relevant for safety. 
This set of locations can change as the program executes, and hence this is more 
flexible than statically specifying all of {l---} upfront. This representation of 
the set by a single location is made explicit in the following monitor rule: 


(LY-Monitor Step) 
M= ({o- KE } , a 00; broot Oc) M’ = ({o RER } par Oo;-Liaets Of) 
(oc, H, of) E€ > H'CH dom(H’) = reach(lroot, H) 
M; H ~> M’ 


Other than this small point, monitors, safety, robust safety and RSC are 


defined as in Sect. 2. In particular, a monitor and a component agree if they 
def 


mention the same loo: MOC = (M = ({o---},~%,00, boot; oc)) and (C = 
CAE F; 1)) 

A program state C, H > (s)ẹ (denoted with 9) includes the function bodies C, 
the heap H, a statement s being executed and a stack of function calls f (often 
omitted in the rules for simplicity). The latter is used to populate judgements of 
the form IF f, f’ : internal/in/out. These determine whether calls and returns are 
internal (within the attacker or within the component), directed from the attacker 
to the component (in) or directed from the component to the attacker (out). This 
information is used to determine whether the semantics should generate a label, 
as in Rules ELY-return to ELY-retback, or no label, as in Rules ELY-ret-internal 
and ELY-call-internal since internal calls should not be observable. LY has a big- 
step semantics for expressions (Hp e —» v) that relies on evaluation contexts, a 


small-step semantics for statements (Q a Q’) that has labels À ::= € | a and 
a semantics that accumulates labels in traces (Q => ©’) by omitting silent 
actions « and concatenating the rest. Unlike existing work on compositional 
compiler correctness which only rely on having the component [40], the semantics 
relies on having both the component and the context. 


(EL!-alloc) 
H>e — v £¢dom(H) 


C,H p let x = new e ins —> 
C,H; — v> s[e / x] 
(ELY-call) 
fi =f”;f f(x) = s; return; € C.funs 
C.intfs H f’,f : in Hoe — v 
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(EL}-return) 


F=f C.intfst f,f’ : out 
C, H > (return; )z7.¢ me 
C, H > (skip) 


(EL!-callback) 
fi =f"; f! f(x) Hs; return; € F 
C.intfst f’,f:out Hoe cv 


C,H > (call f e)p call f v H? 
C, H p (s; return;[v / x])77.¢ 
(EL"-retback) 


f— fief CintfsH ffin 
C,H > (return; je == 
C, H > (skip); 


C, Hp (call f e)p 4" 
C, H > (s; return;|v / x]) 7.6 
(EL"-ret-internal) 

C.intfs F f,f’ : internal 


fi =f”: f 


C,H> (return; )z7,¢ 5 


C, Hb (skip) 


(EL'-call-internal) 
C.intfst f,f’: internal ff? =f";f’ f(x) > s; return; € C.funs 


C, Hp (call f e) — C,H p (s;return;[v / x])F¢ 


Hpecs v 


3.2 The Target Language L? 


LP is an untyped, imperative language that follows the structure of LY and it 
has similar expressions and statements. However, there are critical differences 
(that make the compiler interesting). The main difference is that heap loca- 
tions in LP are concrete natural numbers. Upfront, an adversarial context can 
guess locations used as private state by a component and clobber them. To sup- 
port hidden local state, a location can be “hidden” explicitly via the statement 
let x = hide e in s, which allocates a new capability k, an abstract token that 
grants access to the location n to which e points [64]. Subsequently, all reads and 
writes to n must be authenticated with the capability, so reading and writing 
a location take another parameter as follows: le with e and x := e with e. In 
both cases, the e after the with is the capability. Unlike locations, capabilities 
cannot be guessed. To make a location private, the compiler can make the capa- 
bility of the location private. To bootstrap this hiding process, we assume that 
a component has one location that can only be accessed by it, a priori in the 
semantics (in our formalization, we always focus on only one component and we 
assume that, for this component, this special location is at address 0). 

In detail, L? heaps H are maps from natural numbers (locations) n to values 
v and a tag 7 as well as capabilities, so H := @ | H;n v:n | H;k. The 
tag 7) can be l, which means that n is globally available (not protected) or a 
capability k, which protects n. A globally available location can be freely read 
and written but one that is protected by a capability requires the capability to 
be supplied at the time of read/write (Rule EL’ -assign, Rule EL” -deref). 

LP also has a big-step semantics for expressions, a labelled small-step seman- 
tics and a semantics that accumulates traces analogous to that of LY. 
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(EL -deref) 
nev:7€H (y=) or (n= k and v’ =k) 


He !n with v’ — Hpv 
(EL? -new) 
H=Hi;n'(v,n) Hoeov HW=Hjn4+1yev:l 
C,Hplet x = new e ins — C,H’ps[n+1/x] 
(EL? -hide) 
H>be —> n kédom(H) H=H1ı;n>v:1;H2 H’ =H4;n > v: k;H2;k 
C,Hp let x = hide e ins — C,H'’ps|k /x] 
(EL” -assign) 
Hoe >v H=Hi;n+_:n;He2 H’=Hi;n'+v:7;He 
(7 = L) or (7 =k and v’ =k) 
C,Hpn:=e with v’ — C,H’pskip 


A second difference between LP and LY is that LP has no booleans, while 
LY has them. This makes the compiler and the related proofs interesting, as 
discussed in the proof of Theorem 1. 

In LP, the locations of interest to a monitor are all those that can be reached 
from the address 0. O itself is protected with a capability k,.,,; that is assumed 
to occur only in the code of the component in focus, so a component is defined 
as C ::= kyoot; F; I. We can now give a precise definition of component-monitor 
agreement for LP as well as a precise definition of attacker, which must care 
about the kroot capability. 


def 


M~C&= (M = ({a---},~¥, 00; Kroot; Tc)) and (C = (kroati F;1)) 


def 


CrAcsth= C= (kosan F DI A= F koot é inl’) 


3.3 Compiler from LY to LP 


U 
We now present [-]fs; the compiler from LY to L}, detailing how it uses the 
U 
capabilities of L? to achieve RSC. Then, we prove that [lee attains RSC. 


Compiler [lee takes as input a LY component C and returns a L? component 
(excerpts of the translation are shown below). The compiler performs a simple 
pass on the structure of functions, expressions and statements. Each LY location 
is encoded as a pair of a LP location and the capability to access the location; 
location update and dereference are compiled accordingly. The compiler codes 
source booleans true to 0 and false to 1, and the source number n to the target 
counterpart n. 


_ nyu ini. Ss pu 
[oats FT], = kroos: [F] s: [il]. e 
[left = '[e]t 1 with felib .2 
let x = new e L” let Xioc = new [elt in let Xcap = hide xXjo¢ in 
ey, 


. U 
ms let x = {Xise;Xeap) in BRS 


u U 
| Se = š = å — pagk s 
[x =e Jie = let Xloc = X.1 in let Xcap = X.2 in Xjo¢ := [e Ihe with Xcap 
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This compiler solely relies on the capability abstraction of the target lan- 
guage as a defence mechanism to attain RSC. Unlike existing secure compilers, 


U 
[pe needs neither dynamic checks nor other constructs that introduce runtime 
overhead to attain RSC [9,32,39,53,59]. 


Proof of RSC. Compiler [ts attains RSC (Theorem 1). In order to set up this 
theorem, we need to instantiate the cross-language relation for values, which we 
write as ~g here. The relation is parametrised by a partial bijection 8 : x n x 7) 
from source heap locations to target heap locations which determines when a 
source location and a target location (and its capability) are related. On values, 
~g is defined as follows: true œg 0; false%gn when n Æ 0; n&gn; 6% (n,k) 
if (¢,n,k) € 8; =, (n, _) if (€,n,L) € B; (v1, v2) %8 (V1, V2) if vi %6 Vı and 
V2 %6 V2. This relation is then used to define the heap, monitor state and action 
relations. Heaps are related, written Hg H, when locations related in 3 point 
to related values. States are related, written Q œg Q, when they have related 
heaps. The action relation (a ~g a) is defined as in Sect. 2.2. 


Monitor Relation. In Sect.2.2, we left the monitor relation abstract. Here, we 
define it for our two languages. Two monitors are related when they can sim- 
ulate each other on related heaps. Given a monitor-specific relation oo on 
monitor states, we say that a relation R on source and target monitors is a 
bisimulation if the following hold whenever M = ({0 --- }, ~, g0, root; Tc) and 
M = ({o---},~», 00, Kroot, Tc) are related by R: 


l. oo ¥ 00, and ce% e, and 
2. For all 8 containing (loot; 0, Kroot) and all H, H with H %~ș H: 
(a) (o, H, _) E€ ~ iff (oc, H, ) E€ ~>, and 
(b) (o, H, o’) € ~ and (ce, H, 0’) € ~> imply 
({o pee } 1^, 90; Lroot, o')R({o oo } , ~, 00; Kroot, a’). 


In words, FR is a bisimulation only if MRM implies that M and M simulate each 
other on heaps related by any @ that relates (,.o¢ to O. In particular, this means 
that neither M nor M can be sensitive to the specific addresses allocated during 
the run of the program. However, they can be sensitive to the “shape” of the heap 
or the values stored in the heap. Note that the union of any two bisimulations 
is a bisimulation. Hence, there is a largest bisimulation, which we denote as %. 
Intuitively, M ~ M implies that M and M encode the same safety property (up to 
the aforementioned relation on values +g). With all the boilerplate for RSC in 
place, we state our main theorem. 


Theorem 1 (ae attains RSC). H [be : RSC 


We outline our proof of Theorem 1, which relies on a backtranslation (i. 


Intuitively, (Ont takes a target trace @ and builds a set of source contexts such 
that one of them when linked with C, produces a related trace @ in the source 
(Theorem 2). In prior work, backtranslations return a single context [10,11,21, 
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main(z) + 
: L wdet x = new 4 in L:: (x, 1); 
(1) callf0(14:1,23:1)? let x = new 3 in L :: (x, 2); () 
: : call f 0; 
_—___ : 
(2) ret (11464: 1,24 (3,k): 1,315 11:k)! Hoschee let x SND) Ty Lite (x3) ike) 
(3) call f 2 (11+ 55: 1,2 (3,k):1,31+15:k)? vlet x = new L(1) in x := 55; 
—— —S—""_—: 
: rannan pilet x = new L(3) inx:= 15; (3) 
call f 2; 


Fig. 1. Example of a trace and its backtranslated code. 


28,50,53,59]. This is because they all, explicitly or implicitly, assume that ~ is 
injective from source to target. Under this assumption, the backtranslation is 
unique: a target value v will be related to at most one source value v. We do 
away with this assumption (e.g., the target value 0 is related to both source 
values 0 and true) and thus there can be multiple source values related to any 
given target value. This results in a set of backtranslated contexts, of which at 
least one will reproduce the trace as we need it. 

We bypass the lengthy technical setup for this proof and provide an informal 
description of why the backtranslation achieves what it is supposed to. As an 


example, Fig. 1 contains a trace @ and the the output of (@} H 


Oha first generates empty method bodies for all context methods called 
by the compiled component. Then it backtranslates each action on the given 
trace, generating code blocks that mimic that action and places that code inside 
the appropriate method body. Figure 1 shows the code blocks generated for each 
action. Backtranslated code maintains a support data structure at runtime, a 
list of locations denoted L where locations are added (::) and they are looked up 
(L(n)) based on their second field n, which is their target-level address. In order 
to backtranslate the first call, we need to set up the heap with the right values 
and then perform the call. In the diagram, dotted lines describe which source 
statement generates which part of the heap. The return only generates code that 
will update the list L to ensure that the context has access to all the locations 
it knows in the target too. In order to backtranslate the last call we lookup the 
locations to be updated in L so we can ensure that when the call f 2 statement 
is executed, the heap is in the right state. 

For the backtranslation to be used in the proof we need to prove its correct- 
ness, i.e., that wE generates a context A that, together with C, generates a 
trace @ related to the given target trace @. 


Theorem 2 (ye) is correct) 


if A [Elie | Z N then IA € (a) A] SS N and axga and Qp. 
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This theorem immediately implies that + ee : PF-RSC, which, by Theorem 
3 below, implies that F [be : RSC. 


Theorem 3 (PF-RSC and RSC are equivalent for JE). 
p LM 
F [lie : PF-RSC 4> F [lie : RSC 


Example 4 (Compiling a secure program). To illustrate RSC at work, let us 
consider the following source component Ca, which manages an account whose 
balance is security-relevant. Accordingly, the balance is stored in a location (Zoot 
that is tracked by the monitor. C, provides functions to deposit to the account 
as well as to print the account balance. 


deposit(x) ++ let q=abs(x) in let amt =!root in Loot = amt + q 
balance() +> Moot 


Ca never leaks any sensitive location (oot) to an attacker. Additionally, an 
attacker has no way to decrement the amount of the balance since deposit only 
adds the absolute value abs(x) of its input x to the existing balance. 


By compiling Ca with Îi] we obtain the following target program. 


deposit(x) +> let q=abs(x) in 
let amt=!0 with kroot in 0 := amt + q with kyoot 


balance() +> !0 with k,oot 


Recall that location ¢,.o¢ is mapped to location 0 and protected by the kroot 
capability. In the compiled code, while location O is freely computable by a 
target attacker, capability k,..; is not. Since that capability is not leaked to 
an attacker, an attacker will not be able to tamper with the balance stored in 
location 0. 


4 RSC via Bisimulation 


If the source language has a verification system that enforces robust safety, 
proving that a compiler attains RSC can be simpler than that of Sect. 3—it 
may not require a back translation. To demonstrate this, we consider a specific 
class of monitors, namely those that enforce type invariants on a specific set of 
locations. Our source language, L7, is similar to LY but it has a type system 
that accepts only those source programs whose traces the source monitor never 
rejects. Our compiler pIE is directed by typing derivations, and its proof of RSC 
establishes a specific cross-language invariant on program execution, rather than 
a backtranslation. A second, independent goal of this section is to show that RSC 
is compatible with concurrency. Consequently, our source and target languages 
include constructs for forking threads. 
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4.1 The Source Language L7 


L7 extends LY with concurrency, so it has a fork statement (|| s), processes and 
process soups [19]. Components define a set of safety-relevant locations A, so 
C= A;F;l and heaps carry type information, so H := @|H;@t+v:7. A also 
specifies a type for each safety-relevant location, so A ::= @ | A;(¢: 7). 

L7 has an unconventional type system that enforces robust type safety [1,14, 
31,34,45,58], which means that no context can cause the static types of sen- 
sitive heap locations to be violated at runtime. Using a special type UN that 
is described below, a program component statically partitions heap locations it 
deals with into those it cares about (sensitive or “trusted” locations) and those 
it does not care about (“untrusted” locations). Call a value shareable if only 
untrusted locations can be extracted from it using the language’s elimination 
constructs. The type system then ensures that a program component only ever 
shares shareable values with the context. This ensures that the context cannot 
violate any invariants (including static types) of the trusted locations, since it 
can never gets direct access to them. 

Technically, the type system considers the types 7 ::= Bool | Nat | 7 xr | 
Ref r | UN and the following typing judgements (l maps variables to types). 


+ C: UN Component C is well-typed. A,  HFe:r Expression e has type T. 


Tro Type 7 is shareable. C,A,F Fs Statement s is well-typed. 
(TL*-bool-pub) — (TL’-nat-pub) (TL”-pair-pub) (TL’-un-pub) — (TL -references-pub) 
Tko wko 
Bool F- o Nat F o rxr Fo UNF o Ref UNF o 


Type UN stands for “untrusted” or “shareable” and contains all values that can 
be passed to the context. Every type that is not a subtype of UN is implicitly 
trusted and cannot be passed to the context. Untrusted locations are explic- 
itly marked UN at their allocation points in the program. Other types are 
deemed shareable via subtyping. Intuitively, a type is safe if values in it can only 
yield locations of type UN by the language elimination constructs. For example, 
UN x UN is a subtype of UN. We write 7 F o to mean that 7 is a subtype of UN. 

Further, L7 contains an endorsement statement (endorse x = e as y in s) that 
dynamically checks the top-level constructor of a value of type UN and gives it 
a more precise superficial type y ::= Bool | Nat | UN x UN | Ref UN [24]. This 
allows a program to safely inspect values coming from the context. It is similar 
to existing type casts [48] but it only inspects one structural layer of the value 
(this simplifies the compilation). 

The operational semantics of L7 updates that of LY to deal with concurrency 
and endorsement. The latter performs a runtime check on the endorsed value [62]. 

Monitors M := ({o0---},~»,00,A,o-) check at runtime that the set of 
trusted heap locations A have values of their intended static types. Accord- 
ingly, the description of the monitor includes a list of trusted locations and their 
expected types (in the form of an environment A). The type 7 of any location 
in A must be trusted, so T ¥ o. To facilitate checks of the monitor, every heap 


Robustly Safe Compilation 487 


location carries a type at runtime (in addition to a value). The monitor transi- 
tions should therefore be of the form (c, A, c), but since A never changes, we 


write the transitions as (o,o). 


A monitor and a component agree if they have the same A: MCC zf 


Qo- }, ~, o0, A, oc) ~(A;F;1). Other definitions (safety, robust safety and 
actions) are as in Sect. 2. Importantly, a well-typed component generates traces 
that are always accepted, so every component typed at UN is robustly safe. 


Theorem 4 (Typability Implies Robust Safety in L7) 


If ©} C:UN and CM then ME C: rs 


Richer Source Monitors. In L’, source language monitors only enforce the prop- 
erty of type safety on specific memory locations (robustly). This can be general- 
ized substantially to enforce arbitrary invariants other than types on locations. 
The only requirement is to find a type system (e.g., based on refinements or 
Hoare logics) that can enforce robust safety in the source (cf. [68]). Our com- 
pilation and proof strategy should work with little modification. Another easy 
generalization is allowing the set of locations considered by the monitor to grow 
over time, as in Sect. 3. 


4.2 The Target Language L” 


Our target language, L7, extends the previous target language LP, 
with support for concurrency (forking, processes and process soups), 
atomic co-creation of a protected location and its protecting capability 
(let x = newhide e in s) and for examining the top-level construct of a value 
(destruct x = e as B in s or s’) according to a pattern (B ::= nat | pair). 


(EL” -destruct-nat) 
Hre =» n 
C,H» destruct x = e as nat insors’ — C,Hps[n/ x] 
(EL -new) 
H = Hı;n => (v,n)) Hoe~v_ kgëdom(H) s’=s[(n+1,k) /x] 
C.H vb let x = newhidee ins — C H:n+1v:kkies 


Monitors are also updated to consider a fixed set of locations (a heap Ho), so 
M ::= ({a---},~+,00,Ho,oc). The atomic creation of capabilities is provided 
to match modern security architectures such as Cheri [71] (which implement 
capabilities at the hardware level). This atomicity is not strictly necessary and 
we prove that RSC is attained both by a compiler relying on it and by one that 
allocates a location and then protects it non-atomically. The former compiler 
(with this atomicity in the target) is a bit easier to describe, so for space reasons, 
we only describe that here and defer the other one to the companion report [61]. 
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4.3 Compiler from L7 to L” 


The high-level structure of the compiler, [Ii is similar to that of our earlier 


compiler (i. (Sect. 3.3). However, HRS is defined by induction on the type 
derivation of the component to be compiled. The case for allocation (presented 
below) explicitly uses type information to achieve security efficiently, protecting 
only those locations whose type is not UN. 


let xo = new [A,F Fe: the 


Arrears L7 in let x = (xo, 0) if r = UN 
C,A,T;x: Ref rH s in [C, A, T;x : Ref 7H s]in 
C, A,r H = 


let x = new; eins pn è Er 
L let x = newhide [A,F F e: T]g=- 


i L otherwise 
in [C A; T; x: Ref TF Sir 


New Monitor Relation. As monitors have changed, we also need a new monitor 
relation M ~ M. Informally, a source and a target monitor are related if the target 
monitor can always step whenever the target heap satisfies the types specified 
in the source monitor (up to renaming by the partial bijection 8). 

We write F H : A to mean that for each location / € A, F H(¢) : A(2). Given 
a partial bijection @ from source to target locations, we say that a target monitor 
M = ({0--; }, ~, o0, Ho, cc) is good, written F M : 8, A, if for all o € {o---} 
and all H~g H such that H H : A, there is a o’ such that (o, H, o’) € ~~. For 
a fixed partial bijection 6o between the domains of A and Ho, we say that 
the source monitor M and the target monitor M are related, written M ~ M, if 
H M: 6o, A for the A in M. With this setup, we define RSC as in Sect. 2. 


Theorem 5 (Compiler Ie attains RSC). H [Ji : RSC 


To prove that pE attains RSC we do not rely on a backtranslation. Here, 
we know statically which locations can be monitor-sensitive: they must all be 
trusted, i.e., must have a type 7 satisfying 7 ¥ o. Using this, we set up a simple 
cross-language relation and show it to be an invariant on runs of source and 
compiled target components. The relation captures the following: 


— Heaps (both source and target) can be partitioned into two parts, a trusted 
part and an untrusted part; 

— The trusted source heap contains only locations whose type is trusted (7 ¥ o); 

— The trusted target heap contains only locations related to trusted source 
locations and these point to related values; more importantly, every trusted 
target location is protected by a capability; 

— In the target, any capability protecting a trusted location does not occur in 
attacker code, nor is it stored in an untrusted heap location. 
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We need to prove that this relation is preserved by reductions both in com- 
piled and in attacker code. The former follows from source robust safety (Theo- 
rem 4). The latter is simple since all trusted locations are protected with capabil- 
ities, attackers have no access to trusted locations, and capabilities are unforge- 
able and unguessable (by the semantics of L”). At this point, knowing that mon- 
itors are related, and that source traces are always accepted by source monitors, 
we can conclude that target traces are always accepted by target monitors too. 
Note that this kind of an argument requires all compilable source programs to be 


U 
robustly safe and is, therefore, impossible for our first compiler llie- Avoiding 
the backtranslation results in a proof much simpler than that of Sect. 3. 


5 Fully Abstract Compilation 


Our next goal is to compare RSC to FAC at an intuitive level. We first define 
fully abstract compilation or FAC (Sect. 5.1). Then, we present an example of 
how FAC may result in inefficient compiled code and use that to present in 
Sect. 5.2 what would be needed to write a fully abstract compiler from LY to 
LP (the languages of our first compiler). We use this example to compare RSC 
and FAC concretely, showing that, at least on this example, RSC permits more 
efficient code and affords simpler proofs that FAC. 

However, this does not imply that one should always prefer RSC to FAC 
blindly. In some cases, one may want to establish full abstraction for reasons 
other than security. Also, when the target language is typed [10,11,21,50] or has 
abstractions similar to those of the source, full abstraction may have no down- 
sides (in terms of efficiency of compiled code and simplicity of proofs) relative to 
RSC. However, in many settings, including those we consider, target languages 
are not typed, and often differ significantly from the source in their abstractions. 
In such cases, RSC’ is a worthy alternative. 


5.1 Formalising Fully Abstract Compilation 


As stated in Sect.1, FAC requires the preservation and reflection of observa- 
tional equivalence, and most existing work instantiates observational equivalence 
with contextual equivalence (~-1). Contextual equivalence and FAC are defined 
below. Informally, two components C; and C% are contextually equivalent if no 
context A interacting with them can tell them apart, i.e., they are indistinguish- 
able. Contextual equivalence can encode security properties such as confidential- 
ity, integrity, invariant maintenance and non-interference [6,9,53,60]. We do not 
explain this well-known observation here, but refer the interested reader to the 
survey of Patrignani et al. [54]. Informally, a compiler BH is fully abstract if it 
translates (only) contextually-equivalent source components into contextually- 
equivalent target ones. 
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Definition 4 (Contextual equivalence and fully abstract compilation). 


Ci ctz Co 2 AA [Cy] } 4 A[Ce] f, where tt means execution divergence 


H [Ii : FAC = VC, C2. Ci ~ete Co > [Cid ~et [Col 

The security-relevant part of FAC is the = implication [29]. This part is 
security-relevant because the proof thesis concerns target contextual equivalence 
(~eiz). Unfolding the definition of ~. on the right of the implication yields 
a universal quantification over all possible target contexts A, which captures 
malicious attackers. In fact, there may be target contexts A that can interact 
with compiled code in ways that are impossible in the source language. Compilers 
that attain FAC with untyped target languages often insert checks in compiled 
code that detect such interactions and respond to them securely [60], often by 
halting the execution [6,9,29,37,39,42,53,54]. These checks are often inefficient, 
but must be performed even if the interactions are not security-relevant. We now 
present an example of this. 


Example 5 (Wrappers for heap resources). Consider a password manager written 
in an object-oriented language that is compiled to an assembly-like language. The 
password manager defines a private List object where it stores the passwords 
locally. Shown below are two implementations of the newList method inside 
List which we call Cone and Ciwo. The only difference between Cone and Ciwo is 
that Crwo allocates two lists internally; one of these (shadow) is used for internal 
purposes only. 


1| public newList(): List{ public newList(): List{ 

2 2| shadow = new List(); // diff 
ell = new List(); 3} ell = new List(); 

1] return ell; 1) return ell; 


5|} 5|} 


Cone and Cry. are equivalent in a source language that does not allow pointer 
comparison (like our source languages). To attain FAC when the target allows 
pointer comparisons (as in our target languages), the pointers returned by 
newList in the two implementations must be the same, but this is very diffi- 
cult to ensure since the second implementation does more allocations. A sim- 
ple solution to this problem is to wrap ell in a proxy object and return the 
proxy [9,47,53,59]. Compiled code needs to maintain a lookup table mapping 
the proxy to the original object and proxies must have allocation-independent 
addresses. Proxies work but they are inefficient due to the need to look up the 
table on every object access. 


In this example, FAC forces all privately allocated locations to be wrapped 
in proxies. However, RSC does not require this. Our target languages L? and 
L” support address comparison (addresses are natural numbers in their heaps) 
but Le and [IL just use capabilities to attain security efficiently while [JE 
relies on memory isolation. On the other hand, for attaining FAC, capabilities 
alone would be insufficient since they do not hide addresses. We explain this in 
detail in the next subsection. 


Robustly Safe Compilation 491 


Remarks. Our technical report lists many other cases of FAC forcing security- 
irrelevant inefficiency in compiled code [61]. All of these can be avoided by just 
replacing contextual equivalence with a different notion of equivalence in the 
statement of FAC. However, it is not clear how this can be done generally for 
any given kind of inefficiency, and what the security consequences of such instan- 
tiations of the statement of FAC are. On the other hand, RSC is uniform and 
it does not induce any of these inefficiencies. 

A security issue that cannot be addressed just by tweaking equivalences 
is information leaks on side channels, as side channels are, by definition, not 
expressible in the language. Neither FAC nor RSC deals with side channels. 


5.2 Towards a Fully Abstract Compiler from LY to L? 


To further compare FAC and RSC, we now sketch what would be needed to 
construct a fully abstract compiler from LY to LP. In particular, this compiler 
should not suffer from the “attack” described in Example 5. 


Inefficiency. We denote with -A a (hypothetical) new compiler from LY 
to LP that attains FAC. We describe informally what code generated by this 
compiler would have to do. We know that fully abstract compilation preserves all 
source abstractions in the target language. One abstraction that distinguishes 
LP from LY is that locations are abstract in LP , but concrete natural numbers in 
LY. Thus, locations allocated by compiled code must not be passed directly to the 
context as this would reveal the allocation order. Instead of passing the location 
(n, k) to the context, the compiler arranges for an opaque handle (n’, keom) (that 
cannot be used to access any location directly) to be passed. Such an opaque 
handle is often called a mask or seal in the literature [66]. 
U 


To ensure that masking is done properly, [ . aA can insert code at entry 
and exit points of compiled code, wrapping the compiled code in a way that 
enforces masking [32,59]. The wrapper keeps a list L of component-allocated 
locations that are shared with the context in order to know their masks. When a 
component-allocated location is shared, it is added to the list L. The mask of a 
location is its index in this list. If the same location is shared again it is not added 
again but its previous index is used. To implement lookup in L we must compare 
capabilities too, so we need to add that expression to the target language. To 
ensure capabilities do not leak to the context, the second field of the pair is a 
constant capability keom which compiled code does not use otherwise. Clearly, 
this wrapping can increase the cost of all cross-component calls and returns. 

However, this wrapping is not sufficient to attain FAC. A component- 
allocated location could be passed to the context on the heap, so before passing 
control to the context the compiled code needs to scan the whole heap where 
a location can be passed and mask all found component-allocated locations. 
Dually, when receiving control the compiled code must scan the heap to unmask 
any masked location so it can use the location. The problem now is determining 
what parts of the heap to scan and how. Specifically, the compiled code needs to 
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keep track of all the locations (and related capabilities) that are shared, i.e., (i) 
passed from the context to the component and (ii) passed from the component 
to the context. Both keeping track of these locations as well as scanning them 
on every cross-component control transfer is likely to be very expensive. 
Finally, masked locations cannot be used directly by the context to be read 
and written. Thus, compiled code must provide a read and a write function that 
implement reading and writing to masked locations. The additional unmasking 
in these functions (as opposed to native reads and writes) adds to the inefficiency. 


It should be clear as opposed to the RSC compiler Loe (Sect. 3), the FAC 


U 
compiler Il . a just sketched is likely to generate far more inefficient code. 


U 

Proof Difficulty. Proving that |||- = attains FAC can only be done by back- 
translating traces, not contexts alone, since the newly-added target expressions 
cannot be directly backtranslated to valid source ones [7, 9,59]. For this, we need 
a trace semantics that captures all information available to the context. This 
is often called a fully abstract trace semantics [38,55,56]. However, the trace 
semantics we defined for LP is not fully abstract, as its actions record the entire 
heap in every action, including private parts of the heap. Hence, we cannot use 
this trace semantics for proving FAC and so we design a new one. Building a 
fully abstract trace semantics for L? is challenging because we have to keep 
track of locations that have been shared with the context in the past. This sub- 
stantially complicates both the definition of traces and the proofs that build on 
the definition. 

Finally, the source context that the backtranslation constructs from a target 
trace must simulate the shared part of the heap at every context switch. Since 
locations in the target may be masked, the source context has to maintain a 
map from the source locations to the corresponding masked target ones, which 
complicates the backtranslation and the proof substantially. 


U 


To summarize, it should be clear that the proof of FAC for ||- 5 would be 


U 
much harder than the proof of RSC for BRS even though the source and target 
languages are the same and so is the broad proof technique (backtranslation). 


6 Related Work 


Recent work [8,33] presents new criteria for secure compilation that ensure 
preservation of subclasses of hyperproperties. Hyperproperties [25] are a for- 
mal representation of predicates on programs, i.e., they are predicates on sets of 
traces. Hyperproperties capture many security-relevant properties including not 
just conventional safety and liveness, which are predicates on traces, but also 
properties like non-interference, which is a predicate on pairs of traces. Modulo 
technical differences, our definition of RSC coincides with the criterion of “robust 
safety property preservation” in [8,33]. We show, through concrete instances, 
that this criterion can be easily realized by compilers, and develop two proof 
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techniques for establishing it. We further show that the criterion leads to more 
efficient compiled code than does FAC. Additionally, the criteria in [8,33] assume 
that behaviours in the source and target are represented using the same alpha- 
bet. Hence, the definitions (somewhat unrealistically or ideally) do not require 
a translation of source properties to target properties. In contrast, we consider 
differences in the representation of behaviour in the source and in the target and 
this is accounted for in our monitor relation M ~ M. A slightly different account 
of this difference is presented by Patrignani and Garg [60] in the context of 
reactive black-box programs. 

Abate et al. [7| define a variant of robustly-safe compilation called RSCC 
specifically tailored to the case where (source) components can perform unde- 
fined behaviour. RSCC does not consider attacks from arbitrary target contexts 
but from compiled components that can become compromised and behave in 
arbitrary ways. To demonstrate RSCC, Abate et al. [7] rely on two backends 
for their compiler: software fault isolation and tag-based monitors. On the other 
hand, we rely on capability machines and memory isolation (the latter in the 
companion report). RSCC also preserves (a form of) safety properties and can 
be achieved by relying on a trace-based backtranslation; it is unclear whether 
proofs can be simplified when the source is verified and concurrent, as in our 
second compiler. 

ASLR [6,37], protected module architectures [9,42,53,59], tagged architec- 
tures [39], capability machines [69] and cryptographic primitives [4,5,22,26] have 
been used as targets for FAC. We believe all of these can also be used as targets 
of RSC-attaining compilers. In fact, some targets such as capability machines 
seem to be better suited to RSC than FAC, as we demonstrated. 

Ahmed et al. prove full abstraction for several compilers between typed lan- 
guages [10,11,50]. As compiler intermediate languages are often typed, and as 
these types often serve as the basis for complex static analyses, full abstraction 
seems like a reasonable goal for (fully typed) intermediate compilation steps. 
In the last few steps of compilation, where the target languages are unlikely to 
be typed, one could establish robust safety preservation and combine the two 
properties (vertically) to get an end-to-end security guarantee. 

There are three other criteria for secure compilation that we would like to 
mention: securely compartmentalised compilation (SCC) [39], trace-preserving 
compilation (TPC) [60] and non-interference-preserving compilation (NIPC) [12, 
15,16,27]. SCC is a re-statement of the “hard” part of full abstraction (the for- 
ward implication), but adapted to languages with undefined behaviour and a 
strict notion of components. Thus, SCC suffers from much of the same efficiency 
drawbacks as FAC. TPC is a stronger criterion than FAC, that most existing 
fully abstract compilers also attain. Again, compilers attaining TPC also suffer 
from the drawbacks of compilers attaining FAC. 

NIPC preserves a single property: noninterference (NI). However, this line of 
work does not consider active target-level adversaries yet. Instead, the focus is 
on compiling whole programs. Since noninterference is not a safety property, it 
is difficult to compare NIPC to RSC directly. However, noninterference can also 
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be approximated as a safety property [20]. So, in principle, RSC (with adequate 
massaging of observations) can be applied to stronger end-goals than NIPC. 

Swamy et al. [67] embed an F* model of a gradually and robustly typed 
variant of JavaScript into an F* model of JavaScript. Gradual typing supports 
constructs similar to our endorsement construct in L7. Their type-directed com- 
piler is proven to attain memory isolation as well as static and dynamic memory 
safety. However, they do not consider general safety properties, nor a specific, 
general criterion for compiler security. 

Two of our target languages rely on capabilities for restricting access to sensi- 
tive locations from the context. Although capabilities are not mainstream in any 
processor, fully functional research prototypes such as Cheri exist [71]. Capa- 
bility machines have previously been advocated as a target for efficient secure 
compilation [30] and preliminary work on compiling C-like languages to them 
exists, but the criterion applied is FAC [69]. 


7 Conclusion 


This paper has examined robustly safe compilation (RSC), a soundness criterion 
for compilers with direct relevance to security. We have shown that the criterion 
is easily realizable and may lead to more efficient code than does fully abstract 
compilation wrt contextual equivalence. We have also presented two techniques 
for establishing that a compiler attains RSC. One is an adaptation of an existing 
technique, backtranslation, and the other is based on inductive invariants. 
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Abstract. Software Fault Isolation (SFI) is a security-enhancing pro- 
gram transformation for instrumenting an untrusted binary module so 
that it runs inside a dedicated isolated address space, called a sandbox. 
To ensure that the untrusted module cannot escape its sandbox, exist- 
ing approaches such as Google’s Native Client rely on a binary verifier 
to check that all memory accesses are within the sandbox. Instead of 
relying on a posteriori verification, we design, implement and prove cor- 
rect a program instrumentation phase as part of the formally verified 
compiler COMPCERT that enforces a sandboxing security property a pri- 
ori. This eliminates the need for a binary verifier and, instead, leverages 
the soundness proof of the compiler to prove the security of the sand- 
boxing transformation. The technical contributions are a novel sandbox- 
ing transformation that has a well-defined C semantics and which sup- 
ports arbitrary function pointers, and a formally verified C compiler that 
implements SFI. Experiments show that our formally verified technique 
is a competitive way of implementing SFI. 


1 Introduction 


Isolating programs with various levels of trustworthiness is a fundamental secu- 
rity concern, be it on a cloud computing platform running untrusted code pro- 
vided by customers, or in a web browser running untrusted code coming from 
different origins. In these contexts, it is of the utmost importance to provide 
adequate isolation mechanisms so that a faulty or malicious computation can- 
not compromise the host or neighbouring computations. 

There exists a number of mechanisms for enforcing isolation that intervene at 
various levels, from the hardware up to the operating system. Hypervisors [10], 
virtual machines [2] but also system processes [17] can ensure strong isolation 
properties, at the expense of costly context switches and limited flexibility in 
the interaction between components. Language-based techniques such as strong 
typing offer alternative techniques for ensuring memory safety, upon which access 
control policies and isolation can be implemented. This approach is implemented 
e.g. by the Java language for which it provides isolation guarantees, as proved 
by Leroy and Rouaix [21]. The isolation is fined-grained and very flexible but 


© The Author(s) 2019 
L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 499-524, 2019. 
https: //doi.org/10.1007/978-3-030-17184-1_18 


500 F. Besson et al. 


the security mechanisms, e.g. stack inspection, may be hard to reason about [7]. 
In the web browser realm, JavaScript is dynamically typed and also ensures 
memory safety upon which access control can be implemented [29]. 


1.1 Software Fault Isolation 


Software Fault Isolation (SFI) is an alternative for unsafe languages, e.g. C, 
where memory safety is not granted but needs to be enforced at runtime by 
program instrumentation. Pioneered by Wahbe et al. [35] and popularised by 
Google’s Native Client [30,37,38], SFI is a program transformation which con- 
fines a software component to a memory sandbox. This is done by pre-fixing 
every memory access with a carefully designed code sequence which efficiently 
ensures that the memory access occurs within the sandbox. In practice, the sand- 
box is aligned and the sandbox addresses are thus of the form 02Y Z where Y is a 
fixed bit-pattern and Z is an arbitrary bit-pattern i.e., Z € [0x0...0,0£F... F]. 
Hence, enforcing that memory accesses are within the sandbox range of addresses 
can be efficiently implemented by a masking operation which exploits the binary 
representation of pointers: it retains the lowest bits Z and sets the highest bits 
to the bit-pattern Y. 

Traditionally, the SFI transformation is performed at the binary level and 
is followed by an a posteriori verification by a trusted SFI verifier [23,31,35]. 
Because the verifier can assume that the code has undergone the SFI transforma- 
tion, it can be kept simple (almost syntactic), thereby reducing both verification 
time and the Trusted Computing Base (TCB). This approach to SFI can be 
viewed as a simple instance of Proof Carrying Code [25] where the compiler is 
untrusted and the binary verifier is either trusted or verified. 

Traditional SFI is well suited for executing binary code from an untrusted 
origin that must, for an adequate user experience, start running as soon as 
possible. Google’s Native Client [30,37] is a state-of-the-art SFI implementation 
which has been deployed in the Chrome web browser for isolating binary code in 
untrusted pages. ARMor [39] features the first fully verified SFI implementation 
where the TCB is reduced to the formal ARM semantics in the HOL proof- 
assistant [9]. RockSalt [24] is a formally verified implementation of an SFI verifier 
for the x86 architecture, demonstrating that an efficient binary verifier can be 
obtained from a machine-checked specification. 


1.2 Software Fault Isolation Through Compilation 


A downside of the traditional SFI approach is that it hinders most compiler opti- 
misations because the optimised code no longer respects the simple properties 
that the SFI verifier is capable of checking. For example, the SFI verifier expects 
that every memory access is immediately preceded by a specific syntactic code 
pattern that implements the sandboxing operation. A semantically equivalent 
but syntactically different code sequence would be rejected. An alternative to 
the a posteriori binary verifier approach is Portable Software Fault Isolation 
(PSFI), proposed by Kroll et al. [16]. In this methodology, there is no verifier 


Compiling Sandboxes: Formally Verified Software Fault Isolation 501 


to trust. Instead isolation is obtained by compilation with a machine-checked 
compiler, such as COMPCERT [18]. Portability comes from the fact that PSFI 
can reuse existing compiler back-ends and therefore target all the architectures 
supported by the compiler without additional effort. 

PSFI is applicable in scenarios where the source code is available or the 
binary code is provided by a trusted third-party that controls the build process. 
For example, the original motivation for Proof Carrying Code [25] was to pro- 
vide safe kernel extensions [26] as binary code to replace scripts written in an 
interpreted language. This falls within the scope of PSFI. Another PSFI scenario 
is when the binary code is produced in a controlled environment and/or by a 
trusted party. In this case, the primary goal is not to protect against an attacker 
trying to insert malicious code but to prevent honest parties from exposing a 
host platform to exploitable bugs. This is the case e.g. in the avionics industry, 
where software from different third-parties is integrated on the same host that 
needs to ensure strong isolation properties between tasks whose levels of criti- 
cality differ. In those cases, PSFI can deliver both security and a performance 
advantage. In Sect. 8, we provide experimental evidence that PSFI is competitive 
and sometimes outperforms SFI in terms of efficiency of the binary code. 


1.3 Challenges in Formally Verified SFI 


PSFI inserts the masking operations during compilation and does away with 
the a posteriori SFI verifier. The challenge is then to ensure that the security, 
enforced at an intermediate representation of the code, still holds for the run- 
ning code. Indeed, compiler optimisation often breaks such security [33]. The 
insight of Kroll et al. is that a safety theorem of the compiled code (i.e., that its 
behaviour is well-defined) can be exploited to obtain a security theorem for that 
same compiled code, guaranteeing that it makes no memory accesses outside its 
sandbox. We explain this in more detail in Sect. 2.2. 

One challenge we face with this approach is that it is far from evident that 
the sandboxing operations and hence the transformed program have well-defined 
behaviour. An unsafe language such as C admits undefined behaviours (e.g. bit- 
wise operations on pointers), which means that it is possible for the observational 
behaviour of a program to differ depending on the level of optimisation. This is 
not a compiler bug: compilers only guarantee semantics preservation if the code 
to compile has a well-defined semantics [36]. Therefore, our SFI transformation 
must turn any program into a program with a well-defined semantics. 

The seminal paper of Kroll et al. emphasises that the absence of unde- 
fined behaviour is a prerequisite but they do not provide a transformation that 
enforces this property. More precisely, their transformation may produce a pro- 
gram with undefined behaviours (e.g. because the input program had unde- 
fined behaviours). This fact was one of the motivation for the present work, and 
explains the need for a new PSFI technique. One difficulty is to remove unde- 
fined behaviours due to restrictions on pointer arithmetic. For example, bitwise 
operators on pointers have undefined C semantics, but traditional masking oper- 
ations of SFI rely heavily on these operators. Another difficulty is to deal with 
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indirect function calls and ensure that, as prescribed by the C standard, they 
are resolved to valid function pointers. To tackle these problems, we propose an 
original sandboxing transformation which unlike previous proposals is compliant 
with the C standard [13] and therefore has well-defined behaviour. 


1.4 Contributions 


We have developed and proved correct COMPCERTSFI, the first full-fledged, fully 
verified implementation of SFI inside a C compiler. The SFI transformation is 
performed early in the compilation chain, thereby permitting the generated code 
to benefit from existing optimisations that are performed by the back-end. The 
technical contributions behind COMPCERTSFI can be summarised as follows. 


— An original design and implementation of the SFI transformation based on 
well-defined pointer arithmetic and which supports function pointers. This 
novel design of the SFI transformation is necessary for the safety proof. 

— A machine-checked proof of the security and safety of the SFI transforma- 
tion. Our formal development is available online [1]. 

— A small, lightweight runtime system for managing the sandbox, built using a 
standard program loader and configured by compiler-generated information. 

— Experimental evidence demonstrating that the portable SFI approach is com- 
petitive and sometimes even outperforms traditional SFI, in particular state- 
of-the-art implementations of (P)Native Client. 


The rest of the paper is organised as follows. In Sect. 2, we present background 
information about the COMPCERT compiler (Sect. 2.1) and the PSFI approach 
(Sect. 2.2). Section 3 provides an overview of the layout of the sandbox and the 
masking operations implementing our SFI. In Sect. 4 we explain how to overcome 
the problem with undefined pointer arithmetic and define masking operations 
with a well-defined C semantics. Section 5 describes how control-flow integrity in 
the presence of function pointers can be achieved by a sligthly more flexible SFI 
policy which allows reads in well-defined areas outside the sandbox. Section 6 
specifies the SFI policy in more detail, and describes the formal Coq proofs 
of safety and security. Section 7 presents the design of our runtime library and 
how it exploits compiler support. Experimental results are detailed in Sect. 8. 
Section 9 presents related work and Sect. 10 concludes. 


2 Background 


This section presents background information about the COMPCERT compiler 
[18] and the Portable Software Fault Isolation proposed by Kroll et al. [16]. 
2.1 COMPCERT 


The COMPCERT compiler [18] is a machine-checked compiler programmed and 
proved correct using the Coq proof-assistant [22]. It compiles C programs down 
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constant Ð c ::= 132 | 164 | f32 | f64 | &gl | & stk 


chunk Ð «k ::= isg | ius | i816 | iure | i32 | te, | foo | fos 
expr > e ::= z | c | De | e10De2 | [elx 
stmt > s ::= skip | x := e | [e1]k := e2 | return e | x := e(e1...,€n)o 


| if e then sı else s2 | s1; s2 | loop s | {s} | exit n | goto lb 


Fig. 1. CMINOR syntax 


to assembly code through a succession of compiler passes which are shown to be 
semantics preserving. COMPCERT features an architecture independent front- 
end. The back-end supports four main architectures: x86, ARM, PowerPC and 
RiscV. To target all the back-ends without additional effort, our secure trans- 
formation is performed in the compiler front-end, at the level of the CMINOR 
language that is the last architecture-independent language of the COMPCERT 
compiler chain. Our transformation can obviously be applied on C programs by 
first compiling them into CMINOR, and then applying the transformation itself. 

The CMINOR language is a minimal imperative language with explicit stack 
allocation of certain local variables [19]. Its syntax is given in Fig. 1. Constants 
range over 32-bit and 64-bit integers but also IEEE floating-point numbers. 
It is possible to get the address of a global variable gl or the address of the 
stack allocated local variables (i.e., stk denotes the address of the current stack 
frame). In COMPCERT parlance, a memory chunk « specifies how many bytes 
need to be read (resp. written) from (resp. to) memory and whether the result 
should be interpreted as a signed or unsigned quantity. For instance, the memory 
chunk 751g denotes a 16-bit signed integer and fg4 denotes a 64-bit floating- 
point number. In CMINOR, memory accesses, written [e],,, are annotated with the 
relevant memory chunk «. Expressions are built from pseudo-registers, constants, 
unary (>) and binary (O) operators. COMPCERT features the relevant unary and 
binary operators needed to encode the semantics of C. Expressions are side-effect 
free but may contain memory reads. 

Instructions are fairly standard. Similarly to a memory read, a memory store 
[e1], = €2 is annotated by a memory chunk «. In CMINOR, a function call such 
as e(€1..-,€n)o represents an indirect function call through a function pointer 
denoted by the expression e, ø is the signature of the function and e; ...,e, are 
the arguments. A direct call is a special case where the expression e is a constant 
(function) pointer. CMINOR is a structured language and features a conditional, 
a block construct {s} and an infinite loop loop s. Exiting the n™ enclosing loop 
or block can be done using an exit n instruction. CMINOR is structured but 
gotos towards a symbolic label /b are also possible. Returning from a function is 
done by a return instruction. CMINOR is equipped with a small-step operational 
semantics. The intra-procedural and inter-procedural control flows are modelled 
using an explicit continuation which therefore contains a call stack. 


CompCert Soundness Theorem. Each compiler pass is proved to be 
semantics preserving using a simulation argument. Theorem 1 states semantics 
preservation. 
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Theorem 1 (Semantics Preservation). If the compilation of program p suc- 
ceeds and generates a target program tp, then for any behaviour beh of program 
tp there exists a behaviour of p, beh’, such that beh improves beh’. 


In this statement, a behaviour is a trace of observable events that are typi- 
cally generated when performing external function calls. COMPCERT classifies 
behaviours depending on whether the program terminates normally, diverges or 
goes wrong. A goes wrong behaviour corresponds to a situation where the pro- 
gram semantics gets stuck (i.e., has an undefined behaviour). In this situation, 
the compiler has the liberty to generate a program with an improved behaviour 
i.e., the semantics of the transformed program may be more defined (i.e., it may 
not get stuck at all or may get stuck later on). 

The consequence is that Theorem 1 is not sufficient to preserve a safety prop- 
erty because the target program tp may have behaviours that are not accounted 
for in the program p and could therefore violate the property. Corollary 1 states 
that in the absence of going-wrong behaviour, the behaviours of the target pro- 
gram are a subset of the behaviours of the source program. 


Corollary 1 (Safety preservation). Let p be a program and tp be a target 
program. Consider that none of the behaviours of p is a going-wrong behaviour. 
If the compilation of p succeeds and generates a target program tp, then any 
behaviour of program tp is a behaviour of p. 


As a consequence, any (safety) property of the behaviours of p is preserved by 
the target program tp. In Sect. 2.2, we show how the PSFI approach leverages 
Corollary 1 to transfer an isolation property obtained at the CMINOR level to 
the assembly code. 


Going-wrong behaviours in CompCert. As safety is an essential property 
of our PSFI transformation, we give below a detailed account of the going-wrong 
behaviours of the COMPCERT languages with a focus on CMINOR. 


Undefined evaluation of expressions. COMPCERT’s runtime values are dynami- 
cally typed and defined below: 


values > v ::= undef | int(i32) | long(is4) | single(fs2) | float(fe4) | ptr(d, o) 


Values are built from numeric values (32-bit and 64-bit integers and floating point 
numbers), the undef value representing an indeterminate value, and pointer 
values made of a pair (b,0) where b is a memory block identifier and o is an 
offset which, depending on the architecture, is either a 32-bit or a 64-bit integer. 
For CMINoR, like all languages of COMPCERT, the unary (>) and binary 
(O) operators are not total. They may directly produce going-wrong behaviours 
e.g. in case of division by int(0). They may also return undef if (i) the argu- 
ments are not in the right range e.g. the left-shift int(z) << int(32); or (ii) 
the arguments are not well-typed e.g. int(i) +inz float(f). Pointer arithmetic 
is strictly conforming to the C standard [13] and any pointer operation that is 
implementation-defined according to the standard returns undef. 
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ptr(b, o) + long(l) = ptr(b,o +1) 

ptr(b,o) — ptr(b,o’) = long(o — 0’) 

ptr(b, 0) !=long(0) =tt if W(b,o) 

ptr(b,o) ==long(0) =ff if W(b,0) 
ptr(b,o)*ptr(b,o’)  =oxo' if W(b,0) \W(b,o’) 
ptr(b, 0) == ptr(b’,o') =f if b#b' AV(b,o) AV(0'"_0’) 
ptr(b,o)!=ptr(b’,o') =tt ifbAb' AV(b,0)^AV(b',0) 


where x € {<, <, ==, >, >, !=} 


Fig. 2. Pointer arithmetic in COMPCERT 


The precise semantics of pointer operations is given in Fig. 2. For simplicity, 
we provide the semantics for a 64-bit architecture. Pointer operations are often 
only defined provided that the pointers are valid, written V, or weakly valid, 
written W. This validity condition requires that the offset o of a pointer ptr (b, o) 
is strictly within the bounds of the block b. The weakly valid condition refers 
to a pointer whose offset is either valid or one-past-the-end of the block b. Any 
pointer arithmetic operation that is not listed in Fig. 2 returns undef. This is 
in particular the case for bitwise operations which are typically used for the 
masking operation needed to implement SFI. 

The indeterminate value undef is not per se a going-wrong behaviour. Yet, 
branching over a test evaluating to undef, performing a memory access over an 
undef address and returning undef from the main function are going-wrong 
behaviours. 


Memory accesses are ruled by a unified memory model [20] that is used through- 
out the whole compiler. The memory is made of a collection of separated blocks. 
For a given block, each offset o below the block size is given a permission 
p € {r,w,...} and contains a memory value 


mval > mv ::= undef | byte(b) | [ptr(b, o)]n 


where b is a concrete byte value and [ptr(b,0)], represents the nth byte of the 
pointer ptr(b,o) for n € {1...8}. A memory write storev(K,m,a,v) is only 
defined if the address a is a pointer ptr(b, 0) to an existing block b such that 
the memory locations (b,0),...,(b,o+ | « | —1) have the permission w and the 
offset o satisfies the alignment constraint of k. A memory read loadu(K,m, a) 
is only defined under similar conditions with the additional restriction that not 
reading all the consecutive fragments of a pointer returns undef. 


Control-flow transfers may go-wrong if the target of the control-flow transfer is 
not well-defined. Hence, a goto /b instruction goes wrong if, in the current func- 
tion, there is no statement labelled by lb; and an exit n instruction goes wrong 
if there are less than n enclosing blocks around the statement containing the 
exit instruction. A conditional if e then sı else s2 goes wrong if the expression 
e does not evaluate to int(i) for some i. Also, the execution goes wrong if the 
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last statement of a function is not a return instruction. Last but not least, a 
function call x := e(e1...,€n)¢ goes wrong if the expression e does not evaluate 
to a pointer ptr(b,0) where b is a function pointer with signature øo. 

We show in Sect. 4 how our transformation ensures that pointer arithmetic 
and memory accesses are always well-defined. Section 5 shows how we make sure 
indirect calls are always correctly resolved. Section 6 shows that, together with 
other statically checkable verifications, our PSFI transformation rules out all 
possible going-wrong behaviours. 


2.2 Portable Software Fault Isolation 


Kroll, Stewart and Appel have pioneered the concept of Portable Software Fault 
Isolation (PSFT) [16] whereby SFI is enforced by a pass of the compiler front-end 
that is architecture independent. The main expected advantage is that isolation 
is implemented, once and for all, for any target architecture. Moreover, the gen- 
erated code is optimised by the back-end passes of the compiler. Compared to 
traditional SFI, there is no architecture-specific binary verifier but instead the 
compiler enters the TCB. The key insight of Kroll et al. is to leverage a formally 
verified compiler, namely COMPCERT, to transfer a security proof of isolation 
obtained at the CMINOR level through the compiler back-end, with minimal 
proof effort. In the following, we recall the only basic properties that a CMINOR 
SFI transformation needs to satisfy so that isolation holds at assembly level. 

In COMPCERT’s terms, the sandbox is identified by a dedicated memory 
block sb. A CMINOR program is secure (Property 1) under the condition that all 
its memory accesses are performed within the sandbox. 


Property 1 (Program security). A CMINOR program p is secure if all its memory 
accesses are within the sandbox block sb. 


After compilation, the assembly code is secure if its observable behaviours are 
the same as the observable behaviours of the CMINOR program. In order to 
apply COMPCERT’s semantics preservation theorem (more precisely Corollary 1), 
it remains to ensure that the CMINOR program has a well-defined semantics 
(Property 2). 


Property 2 (Program safety). A CMINOR program p is safe if all its behaviours 
are well-defined, i.e., not wrong. 


Kroll et al. state Property 1 by means of an instrumented CMINOR seman- 
tics which gets stuck in case of memory accesses outside the sandbox. They 
prove formally that the additional semantic safeguards are never triggered for a 
transformed program. 

Kroll et al. also sketch some necessary steps to prove the Property 2 of safety 
but do not propose a formal proof. This leaves open a number of challenging 
issues such as whether it is feasible to define a masking operation that has a 
defined CMINOR semantics and how to deal with indirect function calls through 
function pointers, More generally, the work leaves open whether a formal proof 
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of Property 2 on safety is possible given the restrictions of CompCert’s semantics 
(notably pointer arithmetic) and without relying on axioms asserting properties 
of an external masking primitive. One of the central contributions of this work 
is to provide a positive answer to this question and propose solutions to these 
issues where neither the sandboxing of memory accesses nor the sandboxing 
of function pointers is part of a TCB. The transformation that circumvents 
the limitations imposed by pointer arithmetic is original and, we surmise, is 
a necessary component to transfer security down to assembly. For a precise 
comparison with Kroll et al. see Sect. 9). 


3 A Thread-Aware Sandbox 


The memory address space of a C program is partitioned into a runtime stack 
of frames, a heap and a dedicated space for global variables. The address space 
of a sandboxed program is re-organised to fit into a single global variable, sb, 
where the global variables, the heap and the stack frames are relocated. Figure 3a 
depicts the memory layout of the program after our SFI transformation. Each 
global variable is relocated and allocated in the sandbox at a given offset, and 
each global memory access of the program is translated into a memory access in 
the sandbox. For managing the heap it suffices to use a sandbox-aware malloc 
implementation that allocates memory inside the sandbox. 

To prevent buffer overflows, a standard approach consists in introducing a so-called 
shadow stack that is used to store the function stack frames. Our implementation 
supports multi-threaded applications and therefore there are as many shadow stacks 
as there are threads. Upon thread creation, we allocate a novel shadow stack in the 
sandbox. The shadow-stack pointer is passed as an additional argument to each function 
call. This is efficient when arguments are passed by register, with the only drawback 
of reserving an additional register. Frames are allocated by incrementing the shadow- 
stack pointer at function entry. All accesses to the original stack are then translated into 
accesses to the sandbox shadow stack. The following Example 1 and the code snippet 
in Fig. 3 illustrate the essence of the transformation. 


heap 
g = {long(5)}; sb[2^k]= {long(5);...}; 
shadow stack 
long foo(){ long foo(sp){ 
; stk[8]; spi=sp + 8 ; 
shadow stack bar(g, &stk); bar (sp1, [&sb] ,sp) ; 
return ([&stk]); return ([sp]); 
global variables } } 
(a) Layout of memory (b) Original CMINOR (c) Sandboxed CMINOR 


Fig. 3. Sandbox transformation 
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Example 1. The CMINoR program of Fig. 3b declares a global variable g initialised to 
the 64-bit integer 5. The function foo allocates a stack frame of 8 bytes that will be 
used to store a 64-bit local variable. By convention, the current stack frame is called 
stk. The function foo calls the function bar with as arguments the value of g and the 
address of the local variable stk; and returns the value, presumably updated by bar, 
of the local variable. 

Syntactically, the program of Fig. 3c only performs memory accesses on the global 
sandbox sb variable. The size of sb variable is 2* for some predefined k. At thread 
creation, a shadow stack is allocated by our sandbox-aware malloc in the sandbox after 
the statically allocated global variables. For our program, the unique global variable g 
is stored at offset 0 and spans over 8 bytes. Therefore, the initial value of the shadow- 
stack pointer sp is 8. After the transformation, the function foo reserves the space 
for the local variable stk by incrementing the pseudo-register sp. The function bar 
is called with the incremented shadow-stack pointer sp1, the value stored at offset 0 
in the sandbox (i.e., the value of the global variable g) and the address of the local 
variable stk which is given by the value of the stack pointer sp. At function exit, the 
value of the local variable stk is returned by dereferencing the shadow-stack pointer sp. 


Our SFI transformation enforces the isolation security policy stipulating that all 
memory accesses are performed within the sandbox sb—at the CMINOR level. However, 
this holds because the semantics gets stuck (i.e., the semantics goes wrong) whenever 
the program performs an access outside the bounds of the sandbox. As explained earlier, 
the compiler is free to translate this into an insecure program that would escape the 
sandbox at runtime. To get a formal security guarantee, it is necessary to transform 
further the CMINOR program to rule out any behaviour that goes wrong i.e., ensure 
Property 2. Given the numerous undefined behaviours of the C language, ruling out any 
going-wrong behaviour may seem a daunting task. In general, this requires to ensure 
both memory safety and control-flow integrity. The following two sections describe how 
we can exploit the SFI transformation and the knowledge that all memory accesses are 
inside the sandbox to ensure both memory safety and control-flow integrity. 


4 Memory-Safe Masking 


For SFI, memory safety is obtained by making sure that every memory access is per- 
formed inside the sandbox. Starting from an analysis of the standard SFI solution, we 
present our own design which satisfies the additional requirements of being compliant 
with the semantic restrictions of COMPCERT and with a strict interpretation of the C 
standard. 


4.1 Standard SFI Masking of Addresses 


Standard SFI transformations ensure memory safety by masking memory accesses. The 
gist of it is to allocate a sandbox sb of size 2" at a 2* aligned memory address, say &sb = 
tag x 2*. Under those constraints, enforcing that an address A is within the bounds 
of the sandbox can essentially be done by replacing the high-address bits by those of 
tag. Using bitwise operations, this can be done by the expression (A&(2*—1))|tag x 2*, 
where & is the bitwise and and | is the bitwise or. More visually, this can be written 
(A&1---1)|tag0---0. 

ae 
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At binary level, this masking transformation is defined and the cost is modest: two 
bitwise operations. However, this masking operation has no well-defined C semantics. 
This is also the case for the semantics of COMPCERT and in particular for the CMINOR 
language. The reason is twofold: bitwise operations over pointer values return undef 
and concrete addresses (e.g. tag x 2") are not pointers for CoMPCERT where they are 
represented by a block and an offset (see Fig. 2). 


4.2 Specialised Masking for 32-Bit Sandboxes 


For 32-bit sandboxes, there exists a variant of the sandboxing primitive which has the 
advantages (1) that the sandbox address does not need to be aligned; (2) that the cost 
of masking may be reduced to a single instruction. In its simplest form, the masking 
primitive is defined by 

&sb + (A — & 8b) 6432-64 


where &sb is the symbolic address of the sandbox. The subtraction of &sb extracts 
the offset of the pointer and the double (unsigned) cast 64 — 32 — 64 has the effect 
of truncating the offset to a 32-bit quantity that is therefore within the bounds of a 
32-bit sandbox. At first sight, this masking is less efficient than the standard masking 
but it is efficient for typical address computations which require both displacement and 
scaling (e.g. A=t+k+ k’ x i32-.64 where t is a 64-bit address, k and k’ are constants 
and i is a 32-bit integer). Assuming that each cast or arithmetic operation is mapped 
to a single instruction’, the masked address A can be computed using 8 instructions: 
4 instructions for computing the address A and 4 more for the sandboxing primitive. 
Using simple properties of modular arithmetic, it is possible to distribute the 64 — 32 
cast over addition and multiplication to obtain the following equivalent formulation of 
the sandboxed address: 


&sb + Abo_.g4 with A! = ts4=32 +c1 + c2 *4 


where cı and cy are compile-time constants: cı = (k — &sb)g432 and co = k64_.39- 
Using this formulation, the address A’ still requires 4 instructions but the cost of the 
sandboxing is reduced to 2 instructions making it on par with the standard sandboxing. 
On x86, 32-bit registers are just zero-extended 64-bit registers. Therefore, the cast 
A264 is actually redundant and the overhead induced by the sandboxing is reduced 
to a single instruction. Our experiments (see Sect. 8.2) validate the practical advantage 
of this encoding. 

Still, as for the standard sandboxing, this sanboxing primitive has no semantics 
in COMPCERT due to the limitations of pointer arithmetic. As a consequence, the 
solution of Kroll et al. [16] does not give actual code for the masking primitive, but 
rather axiomatise its behaviour as an external function. This prevents optimisations 
such as common subexpression elimination or function inlining from happening and 
induces the cost of a function call for each memory access. 


4.3 Towards Well-Defined Pointer Arithmetic 


To illustrate the limitations of pointer arithmetic, we examine the semantic behaviour 
of the standard sandboxing primitive (the specialised sandboxing primitive has similar 


1 Some architecture have rich addressing modes allowing for more compact encodings. 
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issues). The standard sandboxing primitive can be written (A&(2*—1)) | &sb where &sb 
is the address of the sandbox variable. If sb is allocated at runtime at address tag x 2° 
for some tag, this formulation is equivalent at binary level. Again, this heavily relies 
on pointer arithmetic that is undefined and on information about where the sandbox 
is linked at runtime. 

Consider the alternative formulation (A&(2*—7)) + &sb where the bitwise | is 
replaced by a +. This formulation has the advantage that incrementing a pointer, 
here sb, is well-defined (see Fig. 2). As on modern hardware, both addition and bitwise 
operations take a single cycle, the difference in efficiency should be negligible. Moreover, 
at least for x86, the addition can be compiled into the addressing mode. 

Still, this does not solve our issue. To understand this, suppose that A is a pointer. 
In this case, the bitwise &, whose purpose is to extract the pointer offset, is still unde- 
fined. Therefore, the whole expression (A&(2*—1)) + &sb is undefined. Because deref- 
erencing an undefined expression is a going-wrong behaviour, the compiled program 
may have an arbitrary runtime behaviour and escape the sandbox. A prerequisite for 
our masking primitive is therefore to ensure that the evaluation is defined i.e., different 
from undef. As all the semantic operators of COMPCERT are strict in undef (if any 
argument is undef, so is the result), a necessary condition is that A is not undef. As 
A can be obtained from any expression, a challenge is to ensure that every expression 
evaluates to a defined value. A particular difficulty is that the many undefined pointer 
operations (see Fig. 2) cannot be detected by runtime checks. 


4.4 Arithmetisation of the Heap 


To tackle this challenge and ensure that every computation is defined, we propose 
an original and radical approach which ensures syntactically that pointers are neither 
stored in memory nor in local variables. As a result, the program is only manipulating 
integer values and memory addresses are only constructed by the sandboxing primi- 
tives. This approach implies, as a side-effect, that our previously undefined masking 
primitives are defined. Let asb be the runtime address of the symbolic address &sb of 
the sandbox. The masking of an address A can be written 


A’ + &sb 


where A’ is either defined by A’ = A&(2*—1) or A’ = (A — asb)64-92-46;. As A is 
necessarily an integer, A’ is necessarily a defined integer and therefore A’ + &sb returns 
a defined pointer ptr(sb, o) that is necessarily inside the sandbox. 

An additional subtlety is that memory accesses are indexed by a memory chunk « 
which mandates an alignment constraint (e.g. the chunk ig4 mandates an 8-byte aligned 
address). As a result, the masking primitive is parameterised by the chunk « and the 
masking primitive for is4 is A’&mski,, + &sb where mski,, = (2*~?—1) x 2°. 

Only computing over numeric values is facilitated by the fact that the sandboxed 
program is only manipulating pointers relative to a single object, the sandbox. There- 
fore, a solution could be to only compute with pointer offsets. This is not totally 
satisfactory because the null pointer (i-e., 0) would be undistinguishable from the base 
pointer ptr(sb, 0). Instead, we use the integer asb that is the integer runtime address 
of the sandbox (i.e., we have asb = &sb) and perform the following transformation t 
over program expressions. 
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t(&sb) = asb 

t(c) = c for c € {i32, i64, f32, f64} 
t(>e) =» t(e) 

t(e10e2) = t(e1) W t(e2) 

tlel) = [msk (t(e))] 


The operators » and W ensure that, if the expressions are well-typed, they never return 
the undef value. Typical examples include division, modulus, and bitwise shifts. We 
transform expressions so that they evaluate to an arbitrary value when their original 
semantics is undefined. For example, we transform the left-shift operations on 32-bit 
integers so that the resulting expression always has a shift amount less than 32: 


asb ~ a K& (b & 31). 


Similarly, we transform divisions and modulus in the following way, to rule out the 
undefined cases of division by zero and signed division of MIN_SIGNED by -1: 


a/b ~> (at+(a==MIN_SIGNED & b==-1))/(b+(b==0)). 


We can prove that the resulting division expression is always defined. Most of the other 
expressions are always defined and do not need further transformations. 


5 Enforcement of Control-Flow Integrity 


Correct sandboxing of code requires some degree of control-flow integrity. Existing 
SFI implementations enforce a weak form of control-flow integrity which only ensures 
that jumps are aligned and within a sandbox of code. This is achieved by inserting a 
masking operation before indirect jumps, that will mask the target address to ensure 
that the jump is within the sandbox. Additional padding with no-ops is inserted to 
ensure that all the instructions are indeed aligned [30,37,38]. We enforce a stronger, 
more traditional, form of control-flow integrity where any control-flow transfer has a 
well-defined CMINOR semantics. 


5.1 Relaxation of the CMINOR SFI Property 


Intraprocedural control-flow integrity is ensured by simple syntactic checks. For 
instance, they ensure that a goto lb has a corresponding label lb and that an exit n 
has at least n enclosing blocks. The semantics of CMINOR prescribes that function calls 
and returns necessarily match. For this to still hold at the assembly level where the 
return address is explicitly stored in the stack frame, it is sufficient to prove that the 
CMINOR program has no going-wrong behaviour. To ensure control-flow integrity, the 
only remaining issue is due to indirect calls through function pointers. Our control-flow 
integrity counter-measure implements software trampolines and ensures that an indi- 
rect call with signature o can only be resolved by a function pointer towards a function 
with signature ø. 

For this purpose, the existing CMINOR SFI security policy i.e., Property 1, which 
rules out any memory access outside the sandbox is too restrictive. As we shall see, 
the implementation of trampolines necessitates controlled memory reads, outside the 
sandbox, within compiler-generated variables. To accommodate for this extension, we 
propose a slightly relaxed SFI security property which, in addition to memory accesses 
inside the sandbox, authorises other memory reads in read-only regions. 
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Property 3. A CMINOR program is secure if all its memory accesses are within either 
the sandbox block sb or some read-only memory. 


This relaxed property still ensures the integrity of the runtime because all memory 
writes are confined to the sandbox. Note that Property 3 and Property 1 are equivalent 
if the trusted runtime library has no read-only memory. This can be achieved at modest 
cost by modifying slightly the source code and remove the C type qualifier const which 
instructs the compiler that the memory is read-only. 


5.2 Control-Flow Integrity of Indirect Calls 


In Sect. 4, we have eluded the presence of function pointers. They actually perfectly 
fit our strategy of encoding pointers by integers. In this case, each function pointer is 
encoded as an index and the trampoline code translates the index into a valid function 
pointer. 

Consider a function f of signature o and suppose that the function pointer &f 
is compiled into the index 7. The reverse mapping from indexes to function point- 
ers is obtained from a compiler-generated array variable A, such that A,[i] = &f. 
The array variable A, is made of all the function pointers with signature ø. The 
array variable is also padded with a default function pointer such that its length 
is a power of two. At the call site, the instruction e(e,...,e€n)> is transformed into 
[te& msko + & Ag|(te1,..., ten)o where te, tei ...,ten are transformed expressions such 
that all memory accesses are masked and msko is the binary mask ensuring that the 
index te is within the bounds of the variable A,. In our actual implementation, we opti- 
mise direct calls and in this case bypass the trampoline. Therefore, when the expression 
e is aconstant pointer &f to an existing function with signature o, we generate directly 
(&f)(te1..., ten). As a result, only C code using indirect calls goes through the tram- 
poline code. 

Though our implementation only exploits the relaxation of Property 3 for the sake of 
trampolines, a more aggressive implementation could sometimes avoid to relocate read- 
only memory inside the sandbox. This could have a positive impact on optimisations 
which exploit the immutability of read-only memory. 


6 Safety and Security Proofs 


We next give an overview of our fully verified Coq proof of security and safety. 


6.1 Security Proof 


Property 3 is an informal formulation of our security property that is formally stated as 
a CMINOR instrumented semantics. This semantics mimics the CMINOR semantics with 
the exception that memory accesses are restricted: a memory read is either performed 
within the sandbox or in a read-only memory region; a memory write is necessarily 
performed within the sandbox. 

The goal of the security proof is to show that all the memory accesses abide by 
the restrictions of the instrumented semantics. This is stated by Theorem2 which 
establishes that for a transformed program tp, no behaviour of the standard CMINOR 
semantics gets stuck for the instrumented CMINOR semantics. 
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Theorem 2 (Security). For any transformed program tp, every behaviour of tp in the 
standard semantics of CMINOR is also a behaviour of tp in the instrumented semantics. 


The proof is based on the standard technique of forward simulation that is used in 
CompPCERT to ensure the preservation of semantics by compiler passes. Here, the for- 
ward simulation has the distinctive feature of relating the same (transformed) program 
equipped with a standard and an instrumented semantics. Since the only difference 
between the two semantics is that memory accesses must be secure, the crux of the 
proof lies in the correctness of the masking primitive, as stated in the following lemma. 


Lemma 1. For any masked expression e, if e evaluates to some pointer ptr(b, 0), then 
b is the block of the sandboz i.e., sb. 


The proof relies on the definition of the masking primitive: a masked expression e is 
of the form e’ + &sb. Since &sb evaluates to the pointer ptr(sb, 0), then if the whole 
expression evaluates to a pointer ptr(b, o), necessarily b = sb. 


6.2 Safety Proof 


In order to benefit from COMPCERT’s semantic preservation theorem and transport 
our security proof to the compiled assembly program, we must also prove that the 
sandboxed program is safe, i.e., it never gets stuck. We address all the going-wrong 
behaviours that we enumerated in Sect. 2.1. The well-formedness properties of a pro- 
gram (calling only defined functions, accessing only defined variables, jumping only 
to defined labels, exiting from no more blocks than currently enclosed in) are checked 
statically and make the transformation fail if they are violated. Next, the memory 
accesses require the addresses to be valid and adequately aligned: our masking oper- 
ation ensures that this is always the case. Then, the evaluation of expressions must 
always be defined: this has mostly been dealt with the arithmetisation of the memory 
(Sect. 4.4). Finally, function calls should always be performed with the appropriate 
number of well-typed arguments. This is easy to check statically for direct function 
calls, but requires trampolines (as described in Sect.5.2) for indirect function calls. 
The following sandbox invariant encapsulates all these conditions. 


Definition 1 (Sandbox Invariant). A state S of program P satisfies the sandbox 
invariant if the following conditions are satisfied: 


1. indirect control-flow transfers are well-defined in P (e.g. goto instructions in the 
functions of P only jump to defined labels); 

. every function of P ends with an explicit return; 

. every function of P is well-typed; 

. every function of P starts by explicitly initialising its local variables; 

. the global array As for signature o contains function pointers to functions of sig- 
nature o; 

. the environment for local variables and the memory in S only contain properly 
initialised, numerical values. 


GX S 


Ss 


Properties 1, 2, 3 are ensured by a set of syntactic checks over the bodies of all the 
functions of the program. Property 4 is enforced by our function transformation which 
inserts assignments that explicitly initialise all declared local variables. Property 5 is 
ensured by construction of the arrays for function pointers. All these properties can 
be established solely on the program body and do not change during the execution of 
the program. By contrast, Property 6 cannot be checked statically and depends on the 
state of the program at each point. 
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Safe Evaluation of Expressions. A necessary condition for the safe evaluation of 
expressions is that the program is well typed. COMPCERT does not generate these type 
guarantees so we have integrated a verified (simple) type-inference algorithm for CMI- 
NOR programs. Type-checking alone is not sufficient to rule out undefined behaviours 
of C operators, but together with the transformations explained in Sect. 4.4, we prove 
the following lemma about the evaluation of transformed expressions. 


Lemma 2 (Safe evaluation of expressions). In a memory state and a well-typed 
environment for local variables containing only defined numerical values, the transfor- 
mation of any well-typed expression e evaluates to a defined numerical value. 


Lemma 2 follows directly from the properties of our expression transformation. 


Safety of Calls through Trampolines. As mentioned in Sect.5, we implement 
software trampolines to secure function calls through function pointers. To ensure the 
safety of indirect function calls, we maintain a map smap from function signatures 
to the corresponding array identifier and the length of this array. The proof of safety 
relies on the fact that for every function f of signature o present in a program, we 
have smap(c) = (As, lo) such that all offsets lower than lọ in As contain a pointer 
to a function of signature o. The safety proof of indirect calls itself is not hard, but 
we need to set up this signature map and establish invariants relating it to the global 
environment of the program. 


Safety Theorem. Considering the invariants defined in Definitionl, we prove 
Lemma 3 which is our main technical result. 


Lemma 3 (Safety). For any CMINOR program state S that satisfies the invariants, 
either S is a final state or there exists a sequence of steps from S to some S’ such that 
S’ also satisfies the invariants. 


A subtlety of the proof is that at function entry, the local variables carry the value 
undef and therefore the sandbox invariant only holds after they have been initialised 
by a sequence of assignments (see Property 4 of Definition 1). 

Using Lemma 3, we can show Property 2, in the form of Theorem 3. 


Theorem 3 (Safety of the transformation). All behaviours of the transformed 
program are well-defined, i.e., not wrong. 


Proof. A going-wrong behaviour occurs precisely when a state is reached, from which 
no further step can be taken, though it is not a final state. Lemma 3, together with a 
proof that the initial state of the transformed prorgam satisfies the invariants, tells us 
that no such reachable state exists, concluding the proof. 


As a result, we benefit from CoMPCERT’s semantic preservation theorem and can 
transport the security proof down to the assembly program. 


Theorem 4 (Security of the compiled program). Let p be a transformed CMINOR 
program. If p compiles into the assembly program tp, then tp is secure. 


The proof uses Corollary 1 and Theorem 2 to conclude that the behaviours of tp are 
the same as those of p, and hence secure. 
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7 SFI Runtime and Library 


Our modified COMPCERT compiler, COMPCERTSFI, takes as input a C program unit in 
the form of a list of C files. Each C file is first compiled down to the CMINOR language 
using the existing passes of the COMPCERT compiler. Then, all the CMINOR programs 
are syntactically linked [14] together to form the program unit to be isolated inside the 
sandbox. COMPCERTSFI comes with a lightweight runtime and a generic support for 
interfacing with a trusted library (e.g. a libC). An originality of our approach is that 
the runtime is using a standard program loader. Moreover, the runtime gets some of 
its configuration through compiler-generated variables. 


7.1 Loading the SFI Application 


The sandboxed code is linked with our runtime library by a linker script which specifies 
where to load at runtime the sb variable, viewed as the data segment. The compiler 
also emits a sandbox configuration map which contains the symbolic address of the 
sandbox, its numeric value at runtime, the total size of the sandbox and the range of 
addresses reserved for global variables. 

Our runtime code is executed before starting the sandboxed main function. It first 
checks that the sandbox is properly linked according to the sandbox configuration map, 
sets the shadow-stack pointer and initialises the sandbox heap using our sandbox-aware 
implementation of malloc based on ptmalloc3?. 

By construction, our runtime stack is free of buffer overruns. Yet, if the recursion 
is too deep, the stack may overflow. Therefore, the runtime inserts an unmapped page 
guard at the bottom of the stack and intercepts the segmentation fault. This protection 
suffices provided that the size of each function stack frame does not exceed a page; 
which can be checked at compile-time. Eventually, after copying its arguments inside 
the sandbox, the runtime calls the main function of the sandboxed application. 


7.2 Monitoring Calls to the Runtime Library 


The runtime library is trusted and therefore part of the TCB. To ensure isolation, each 
call towards the runtime library is monitored to check the validity of the arguments. 
For this purpose, a call to a library function, say foo, is renamed in the object file into a 
call to a function sb_foo which sanitises its arguments before really calling the function 
foo. The verifications are library specific but usually straightforward to implement. For 
stdio, the FILE structures are allocated by the runtime outside of the sandbox. Hence, 
the returned FILE* cannot be dereferenced to corrupt the FILE structure. To prevent 
the sandboxed program to forge FILE* pointers, the runtime maintains at all time the 
set of valid FILE*. For variadic functions e.g., printf, we statically compile the format 
into a sequence of safe primitive calls. (We reject programs using formats computed 
at runtime). For functions in string, we check beforehand that the range of memory 
accesses is within the range of the sandbox. We also allow callbacks and therefore a 
runtime function may take a function pointer as argument. To ensure that the function 
is valid, the runtime is using the trampoline programming pattern presented in Sect. 5.2. 


? http: //www.malloc.de/malloc/ptmalloc3-current.tar.gz. 
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7.3 Communication via Global Variables 


Programs may not only communicate via function calls but also directly via global 
variables. For the libC, this includes e.g. stdout or errno. To ensure isolation, COM- 
PCERTSFI relocates those variables inside the sandbox but also generates a global 
variable map which is an array variable of the form 


{&n1,01,..., &ni, 01,..., Nm, Om} 


where &n; is the symbolic address of a global variable and o; is its offset in the sandbox. 
Using this information, the runtime has the ability to synchronise the values of the 
variables inside and outside the sandbox. For example, at program startup, the value 
of stdout (a stream pointer) is copied inside the sandbox at the relevant offset. This 
allows the sandboxed program to call stdio functions but protects the integrity of the 
stream. For errno, it is the responsibility of each runtime library call to synchronise 
the value of errno in the sandbox. 


8 Experiments 


We have evaluated our PSFI approach over the COMPCERT benchmark suite and a port 
of QUAKE. All the experiments have been carried over a quad-core Intel 6600U laptop 
at 2.6GHz with 16GB of RAM running Linux Fedora 27. For QUAKE, we explain 
how to adapt the code to our runtime library and verify the absence of noticeable 
slowdown. For the other benchmarks, we make a more detailed performance evaluation 
and compare COMPCERTSFI with COMPCERT, GCC, CLANG but also the state-of-the- 
art (P)NaCl implementation of SFI. In our experiments, all the benchmarks are ordered 
by increasing running time. Moreover, for computing a runtime overhead, the running 
time is obtained by taking the harmonic mean of 3 consecutive runs. 


8.1 Porting Quake 


QUAKE engines come in various flavours and we use the tyr-quake® implementation 
linking with XB. The port requires the addition of several functions to our runtime 
library from XLIB and the LIBC. Most of them are not problematic and require no or 
little modification. For instance, the getopt function which is used to parse command- 
line options is using the global variables optarg, optind, opterr, and optopt. As 
explained in Sect. 7.3, the runtime library copies the values of these variables at reserved 
places inside the sandbox. 

Other functions, e.g. gethostbyname, allocate memory on their own and return a 
pointer to this piece of data which is therefore not accessible to the sandboxed code. For 
the specific case of gethostbyname, the library provides the function gethostbyname_r 
which, instead of allocating memory, takes as argument a data-structure that is filled 
by the function. In our case, we pass as argument a sandbox allocated piece of memory. 
This does not solve our problem entirely as inner pointers may still point outside the 
sandbox. To cope with this issue, we perform a deep copy of the relevant piece of data 
inside the sandbox. 

A last issue is that the video memory is shared between the application and the X 
server using the system call shmat. Fortunately, the libC provides the relevant flags to 


3 https: //disenchant.net /git /tyrquake.git. 
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bind shared memory at a specific address. Hence, we were able to allocate it inside the 
sandbox thus allowing a seamless communication with the X server. After these mod- 
ifications, the sandboxed QUAKE runs without noticeable slowdown which is encour- 
aging and an indication of the good overall performance of our sandboxing technique. 
In the following, we complement this with a more precise runtime evaluation for the 
COMPCERT benchmarks. 


8.2 PSFI Overhead: Impact of Sandboxing Primitives 


Next, we compare the efficiency of a standard masking primitive (Sect. 4.1) with a 
specialised version for 32-bit sandboxes (Sect. 4.2). 

Figure 4 shows the overhead of the standard sandboxing primitive with respect to 
the specialised sandboxing primitive. There are 6 benchmarks for which the overhead 
incurred by the standard sandboxing is above 10% reaching 40% for 2 benchmarks. 
These cases illustrate the significant performance advantage that is sometime obtained 
by the specialised sandboxing. For some benchmarks, the standard sandboxing outper- 
forms our optimised sandboxing. Yet when it does it is by a very small margin (below 
3%). Overall, for the vast majority of our benchmarks, the specialised sandboxing 
primitive is very competitive. 

In Sect. 4.1, we gave theoretical arguments for the advantage of the specialised 
sandboxing. Another argument comes from the fact that the specialised sandboxing 
is easier to optimise. First, note that the standard and the specialised sandboxing 
primitives are both using a bitwise mask but for different purposes. For the standard 
primitive, it is used to enforce that the pointer is within the sandbox bounds but 
also to enforce alignment constraints. For the specialised primitive, it is only used to 
enforce alignment constraints. Using the existing COMPCERT dataflow framework, we 
have implemented an alignment analysis that is quite effective at removing redundant 
alignment masks. To enable more optimisations, we explicit alignment constraints in 
the CMINOR code program (e.g. by specifying that function arguments of a pointer 
type are necessarily aligned). Thus, our experimental results are explained by both the 
theoretical advantages given in Sect. 4.2 and the effectiveness of our alignment analysis. 


v 
D 30 
© 
£ 
T 
Y 204 
E 
o 
> io 
v 
E 
> gloss el Ri ==- 
-10 
r i 
— N N U. i t r Jke nnn 
LE BSCS TES GE GESL LEED AEN OG EG OG 
ELECE O BY oso 55 a9ECNE AN gs 
co aa g ExenegesessG fF 
w fc ST yo> U co S 
Zee gro g & 
© 3 E 


Fig. 4. Overhead of standard w.r.t specialised sandboxing 


518 F. Besson et al. 


8.3 PSFI Overhead: Impact of Compiler Back-End 


As a second experiment, we evaluate the overhead of our PSFI transformation for various 
compilers: COMPCERT, GCC and CLANG. COMPCERT is a moderately optimising com- 
piler and the benchmarks run significantly faster using GCC and CLANG. In Fig. 5, the 
baseline is given by the minimum of the execution times of the three compilers without 
PSFI instrumentation. The black bar is the overhead of a compiler (e.g. COMPCERT), 
with respect to the baseline and the grey bar is the overhead of the same compiler but 
with the PSFI transformation (e.g. COMPCERTSF'). In order to use GCC and CLANG, we 
implement a trusted decompiler from our secured CMINOR programs to CLIGHT, a subset 
of C in COMPCERT. These CLIGHT programs are then compiled with GCC or CLANG. 

For a fair comparison, we should compare programs for which we actually have 
a reasonable security guarantee. We have a formal proof of security and safety (see 
Sect.6) for the sandboxed CMINOR program, and we are confident that our syntax- 
directed decompiler preserves this property. For COMPCERT, this would suffice to pre- 
serve the security of the compiled CLIGHT code, but this is not the case for Gcc and 
CLANG because of semantic discrepancies between the compilers. To limit this risk, 
we have set the compiler flags to instruct GCC and CLANG to adhere to the speci- 
ficity of COMPCERT semantics: signed integer arithmetic is defined and so are wraps 
around (flag -fwrapv), strict aliasing is irrelevant (flag -fno-strict-aliasing), and 
floating-point arithmetic is strictly IEEE 754 compliant (flags -frounding-math and 
-fsignaling-nans). We also instruct the compilers to ignore any knowledge about the 
C library (-fno-builtin). 

Our experimental results are shown in Fig. 5. In Fig. 5a, we have the overhead of 
CoMPCERT and CoMPCERTSFI. The overhead of COMPCERT over GCC and CLANG is 
expected and corroborates existing results*. For 10% of the benchmarks, the overhead 
COMPCERTSFI over COMPCERT is negligible and sometimes the PSFI transformation 
even improves performance. Those are programs for which the PSFI transformation 
introduces few masking operations, if any. For 41% of the benchmarks, the overhead is 
below 10% and can be considered, for most applications, a reasonable efficiency /security 
trade-off. For all the other benchmarks except binarytrees and vmach, the overhead is 
below 25%. The two remaining benchmarks have a significant overhead reaching 82% 
for binarytrees. This corresponds to programs which are memory intensive and where 
sandboxing cannot be optimised. 

In Fig. 5b and c, we perform the same experiments but with Gcc and CLANG. The 
results have some similarities but also have visible differences. For about 60% of the 
benchmarks the overhead is below 20%. Moreover, for both compilers, the average over- 
head is similar: 22% for GCCSFI and 24% for CLANGSFI. Yet, on average GCCSFI makes 
a better job at optimising our benchmarks and best CLANGSFI for about 75% of the 
benchmarks. For the rest of the benchmarks, we observe a significant overhead, up to 
20%, indicating that the PSFI transformation hinders certain aggressive optimisations. 
The results also seem to indicate that optimisations are fragile as the overhead is not 
always consistent across compilers. The case of the integr benchmark is particularly 
striking because it runs with negligible overhead for CLANGSFI but exhibits the worst 
case overhead for GCOSFI. The integr program is using a function pointer inside a loop 
and we suspect that GccSFI, unlike CLANGSFI, fails to optimise the program due to the 
inserted trampoline code. Though less striking, the benchmarks fftw and raytracer 
follow the opposite trend; these are programs where the overhead of CLANGSFI is much 
higher than GCCSFI. 


4 http: //compcert.inria.fr/compcert-C.html+perfs. 
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Fig. 5. Overhead of PSFI:COMPCERT, CLANG, GCC, (P)NaCl 


8.4 PSFI Versus (P)NaCl 


We also compare our compiler-based SFI approach with (P)NaCl [30], which to our 
knowledge is one of the most mature implementations of SFI. Figure 5d shows the 
overhead of COMPCERTSFI, GCCSFI, CLANGSFI with respect to (P)NaCl. The baseline 
is given by the best among NaCl and PNaCl. The best of CLANGSFI and GCCSFI is 
given in dark gray and COMPCERTSF! is given in light grey. 

We first analyse the results of COMPCERTSFI. Our benchmarks are ordered by 
increasing runtime. The first 5 benchmarks have a runtime below one second. They are 
not representative of the performance of both approaches but only illustrate the fact 
that (P)NaCl has a startup penalty due to the verification of the binary and the setup 
of the sandbox. The overhead peaks above 75% for two programs (i.e., fib and integr). 
As the PSFI transformation keeps fib unmodified and only inserts a trampoline call in 
integr, these programs only highlight the limited optimisations performed by Com- 
PCERT. Of the remaining benchmarks, 40% of them run faster or have similar speed 
with CoMPCERTSFI. For those benchmarks, the average overhead of COMPCERTSFI 
w.r.t (P)NaCl is around 9%. Except for a few programs whose overhead skyrockets 
due to COMPCERT not being specialised for speed, we can say that COMPCERTSFI 
performance is comparable to (P)NaCl, having programs with better speed in both 
sides and a large number having similar results. 
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We also matched GCCSFI/CLANGSFI against (P)NaCl to compare the impact on 
performance of more aggressive optimisations. Here 60% of the programs are faster 
with GccSFI/CLANGSFI. Among the remaining programs, 1zw and chomp are programs 
for which the (P)NaCl code runs faster than the optimised GCC CLANG code without 
the PSFI transformation. As (P)NaCl is based on CLANG, more investigation is needed 
to understand this paradox that may be explained by code running outside the sand- 
box i.e. the trusted runtime library. Among the remaining benchmarks, binarytrees 
and lists still show a noticeable overhead. Those are recursive micro-benchmarks for 
which our PSF is costly (see Fig. 5). For lists, 99% of the time is spent in a tight loop 
where only a single address is masked. For binarytrees, 70% of the time is spent in the 
runtime code of malloc and free and therefore this highlights the fact that our imple- 
mentation is less efficient than the (P)NaCl counterpart. Overall these results indicate 
that our implementation of SFI is competitive with (P)NaCl, given similar compilers. 
Furthermore speed can be improved with more sandbox-dedicated optimisations; these 
would be harder for (P)NaCl to check. 


9 Related Work 


Since Wahbe et al. [35] proposed their initial technique for SFI, there has been a number 
of proposals for efficiently confining untrusted software to a memory sandbox (see [23, 
24, 31,32,34,37,39]). One of the most prominent is Google’s Native Client (NaCl) [37], 
which provides an infrastructure for executing untrusted native code in a web browser. 
NaCl was specifically targeted at executing computation-intensive applications without 
incurring a performance penalty. Certain features (in particular self-modifying code) 
were ruled out. These restrictions were addressed in a subsequent work [3]. 

RockSalt [24] is an SFI verifier for x86 code which has been developed and formally 
verified with the proof assistant Coq. The major contribution of RockSalt is to provide a 
formal model of the x86 architecture, from which it is possible to extract a decoder for a 
subset of the very rich set of x86 instructions, and build a verifier for the NaCl sandbox 
policy. Their experiments show that the formally verified checker performs marginally 
better than the NaCl verifier. In comparison, our approach avoids the complexities of 
the x86 instruction set by relying on the COMPCERT compiler back-end to produce 
binaries whose adherence to the sandbox policy is guaranteed by a combination of 
a sandbox verification at a higher level (CMINOR) and the COMPCERT’s correctness 
theorem. 

ARMor [39] is using the binary rewriter Diablo [28] to implement SFI for ARM 
processors. Using an untrusted program analysis, a proof of SFI safety is automatically 
constructed using the HOL theorem prover. ARMor was tested with some programs 
of the MiBench benchmark [11], namely BitCount and StringSearch. These programs 
required 2.5 and 8h respectively to prove the memory safety and control-flow integrity 
of the executables, which means that the approach is not practically viable as it is. 

Kroll et al. [16] proposed PSFI as an alternative methodology to the standard, 
verification-based SFI. In PSFI, the sandbox is built by inserting the necessary mask- 
ing instructions during compilation. This means that the correctness of the transfor- 
mation can be argued at an intermediate stage in the compilation where the program 
representation retains a high-level structure. Our work extends the seminal proposal in 
a number of ways that we detail below. Unlike Kroll et al., we exclude from the TCB 
the masking primitive and the trampoline mechanism for calling external functions. 
In our implementation, these crucial components are written entirely in CMINOR and 
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proved correct without introducing trusted, unproved, code. Kroll et al. sketch a proof 
of safety but do not identify the issue of pointer arithmetic. To sidestep the semantics 
limitation of pointer arithmetic, we introduce a compile-time encoding of pointer as 
integers. This transformation is instrumental for our Coq verified proof of safety, which 
itself is mandatory to transfer security down to assembly. 

Since the seminal work of Norrish [27], several works propose formal semantics of 
the C language [8, 12, 15]. All these share the limitations of COMPCERT with respect to 
pointer arithmetic. Recent works specifically aim at providing a more defined semantics 
for pointers. The proposal of Besson et al. [4] is able to cope with most existing low-level 
pointer manipulations and has been ported to COMPCERT [5,6]. Yet, it has nonetheless 
limitations and the design of our PSFI transformation would not benefit from the 
increased expressiveness. The semantics of Kang et al. [14] is more permissive because, 
after a cast, a pointer is indistinguishable from an integer value. To our knowledge, their 
semantics has not been ported to the COMPCERT compiler. Our SFI transformation 
has the advantage of being compatible with the existing semantics of COMPCERT with 
the caveat that pointers needs to be explicitly compiled into integers. 


10 Conclusion 


We have presented COMPCERTSFI, a formally verified implementation of Software Fault 
Isolation based on the COMPCERT compiler. Our approach provides security guaran- 
tees at runtime when the source code may be malicious or has security vulnerabilities 
but the build process is trusted. This is typically the case when a final product is built 
using code originating from multiple third parties. Our work shows that it is possible 
to perform security-enhancing compilation that is both formally verified and competi- 
tive with existing approaches in terms of efficiency. COMPCERTSFI does not rely on a 
posteriori binary verification for guaranteeing security, and hence has a reduced TCB 
compared to traditional SFI solutions. The reduction in TCB is obtained through a 
formal, machine-checked proof of the fact that the security guaranteed by our SFI trans- 
formation in the compiler front-end, still holds at the assembly level. Key to achieving 
this property has been to fine-tune the transformation (and in particular its pointer 
manipulations) to ensure that the secured program has a well-defined semantics. 

The impact of SFI has been evaluated on a series of benchmarks, showing that the 
transformed code can in a few cases be more efficient, and that the average runtime 
overhead incurred is about 9%. We have evaluated the impact of back-end optimi- 
sation on the transformed code on three different compilers. The gains vary, with 
CLANG being more efficient than COMPCERT and GCC, and COMPCERT being slightly 
more efficient than acc. The experiments show that COMPCERTSFI combined with an 
aggressive back-end optimiser can sometimes achieve performances superior to Native 
Client implementations. In addition, there is still room for further optimisation of the 
generated code. We have observed that existing optimisations are sometimes hindered 
by our SFI transformation, so we gain by having more optimisation before the SFI 
transformation. We also intend to investigate optimisations for removing redundant 
sandboxing operations and in particular hoisting sandboxing outside loops. 
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Abstract. Incremental computation has recently been studied using the 
concepts of change structures and derivatives of programs, where the 
derivative of a function allows updating the output of the function based 
on a change to its input. We generalise change structures to change 
actions, and study their algebraic properties. We develop change actions 
for common structures in computer science, including directed-complete 
partial orders and Boolean algebras. We then show how to compute 
derivatives of fixpoints. This allows us to perform incremental evaluation 
and maintenance of recursively defined functions with particular applica- 
tion generalised Datalog programs. Moreover, unlike previous results, our 
techniques are modular in that they are easy to apply both to variants 
of Datalog and to other programming languages. 


Keywords: Incremental computation - Datalog - Semantics - 
Fixpoints 


1 Introduction 


Consider the following classic Datalog program!, which computes the transitive 
closure of an edge relation e: 


te(x, y) — e(z, y) 
te(x, y) — elz, z) A te(z, y) 


The semantics of Datalog tells us that the denotation of this program is 
the least fixpoint of the rule tc. Kleene’s fixpoint Theorem tells us that we can 
compute this fixpoint by repeatedly applying the rule until the output stops 
changing, starting from the empty relation. For example, supposing that e = 
{(1, 2), (2,3), (3,4)}, we get the following evaluation trace: 


1 See [1, part D] for an introduction to Datalog. 
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Iteration Newly deduced facts Accumulated data in tc 

0 {} {} 

1 {(1, 2), (2,3), (3, 4)} {(1, 2), (2,3), (3,4)} 

2 {(1, 2), (2,3), (3,4), {(1, 2), (2,3), (3, 4), 
(1,3), (2, 4)} (1,3), (2,4)} 

3 {(1, 2), (2,3), (3,4), {(1, 2), (2,3), (3, 4), 
(1,3), (2,4), (1,4),(1,4)} (1,3), (2,4), (1,4) } 

4 (as above) (as above) 


At this point we have reached a fixpoint, and so we are done. 

However, this process is quite wasteful. We deduced the fact (1,2) at every 
iteration, even though we had already deduced it in the first iteration. Indeed, 
for a chain of n such edges we will deduce O(n?) facts along the way. 

The standard improvement to this evaluation strategy is known as “semi- 
naive” evaluation (see [1, section 13.1]), where we transform the program into a 
delta program with two parts: 


— A delta rule that computes the new facts at each iteration. 
— An accumulator rule that accumulates the delta at each iteration to compute 
the final result. 


In this case our delta rule is simple: we only get new transitive edges at iteration 
n + 1 if we can deduce them from transitive edges we deduced at iteration n. 


Atco(z, y) ae e(z, y) 
Atci+ı(x, y) — elz, z) A Atci(z, y) 
tco(x, y) = Atco(x, y) 
tei4i (a, y) — tcai(x, y) V Atci+ı (x,y) 
Iteration Atci tci 
0 {(1,2), (2,3), 83,4}  {(1,2), (2,3), (3,4)} 
1 {(1, 3), (2, 4)} {(1, 2), (2,3), (3,4), 
(1,3), (2,4)} 
2 {(1,4)} {(1, 2), (2, 3), (3,4), 
(1,3), (2,4), (1,4)} 
3 {} (as above) 


This is much better—we have turned a quadratic computation into a linear 
one. The delta transformation is a kind of incremental computation: at each stage 
we compute the changes in the rule given the previous changes to its inputs. 

But the delta rule translation works only for traditional Datalog. It is com- 
mon to liberalise the formula syntax with additional features, such as disjunc- 
tion, existential quantification, negation, and aggregation.” This allows us to 


? See, for example, LogiQL [26,32], Datomic [18], Souffle [38,42], and DES [36], which 
between them have all of these features and more. We do not here explore supporting 
extensions to the syntax of rule heads, although as long as this can be given a 
denotational semantics in a similar style our techniques should be applicable. 
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write programs like the following, where we compute whether all the nodes in a 
subtree given by child have some property p: 


treeP(x) — p(x) A aAy.(child(x, y) A atreeP(y)) 


The body of this predicate amounts to recursion through an universal quan- 
tifier (encoded as =4-). We would like to be able to use semi-naive evaluation 
for this rule too, but the standard definition of semi-naive transformation is not 
well defined for the extended program syntax, and it is unclear how to extend it 
(and the correctness proof) to handle such cases. 

It is possible, however, to write a delta program for treeP by hand; indeed, 
here is a definition for the delta predicate (the accumulator is as before):° 


Aj.itreeP(x) —p(x) 
A Ay.(child(x, y) ^A A;treeP(y)) 
A aAy.(child(x, y) A atreeP;(y)) 


This is a correct delta program (in that using it to iteratively compute treeP 
gives the right answer), but it is not precise because it derives some facts repeat- 
edly. We will show how to construct correct delta programs generally using a 
program transformation, and show how we have some freedom to optimize within 
a range of possible alternatives to improve precision or ease evaluation. 

Handling extended Datalog is of more than theoretical interest—the research 
in this paper was carried out at Semmle, which makes heavy use of a commercial 
Datalog implementation to implement large-scale static program analysis [7, 37, 
39,40]. Semmle’s implementation includes parity-stratified negation*, recursive 
aggregates [34], and other non-standard features, so we are faced with a dilemma: 
either abandon the new language features, or abandon incremental computation. 

We can tell a similar story about maintenance of Datalog programs. Main- 
tenance means updating the results of the program when its inputs change, for 
example, updating the value of tc given a change to e. Again, this is a kind of 
incremental computation, and there are known solutions for traditional Datalog 
[25], but these break down when the language is extended. 

There is a piece of folkloric knowledge in the Datalog community that hints 
at a solution: the semi-naive translation of a rule corresponds to the derivative 
of that rule [8,9, section 3.2.2]. The idea of performing incremental computation 
using derivatives has been studied recently by Cai et al. [14], who give an account 
using change structures. They use this to provide a framework for incrementally 
evaluating lambda calculus programs. 


3 This rule should be read as: we can newly deduce that x is in treeP if x satisfies the 
predicate, and we have newly deduced that one of its children is in treeP, and we 
currently believe that all of its children are in treeP. 

t Parity-stratified negation means that recursive calls must appear under an even 
number of negations. This ensures that the rule remains monotone, so the least 
fixpoint still exists. 
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However, Cai et al.’s work isn’t directly applicable to Datalog: the tricky part 
of Datalog’s semantics are recursive definitions and the need for the fixpoints, so 
we need some additional theory to tell us how to handle incremental evaluation 
and maintenance of fixpoint computations. 

This paper aims to bridge that gap by providing a solid semantic foundation 
for the incremental computation of Datalog, and other recursive programs, in 
terms of changes and differentiable functions. 


Contributions. We start by generalizing change structures to change actions 
(Sect. 2). Change actions are simpler and weaker than change structures, while 
still providing enough structure to handle incremental computation, and have 
fruitful interactions with a variety of structures (Sects. 3 and 6.1). 

We then show how change actions can be used to perform incremental eval- 
uation and maintenance of non-recursive program semantics, using the formula 
semantics of generalized Datalog as our primary example (Sect. 4). Moreover, the 
structure of the approach is modular, and can accommodate arbitrary additional 
formula constructs (Sect. 4.3). 

We also provide a method of incrementally computing and maintaining fix- 
points (Sect.6.2). We use this to perform incremental evaluation and mainte- 
nance of recursive program semantics, including generalized recursive Datalog 
(Sect. 7). This provides, to the best of our knowledge, the world’s first incremen- 
tal evaluation and maintenance mechanism for Datalog that can handle negation, 
disjunction, and existential quantification. 

We have omitted the proofs from this paper. Most of the results have rou- 
tine proofs, but the proofs of the more substantial results (especially those in 
Sect.6.2) are included in an extended report [3], along with some extended 
worked examples, and additional material on the precision of derivatives. 


2 Change Actions and Derivatives 


Incremental computation requires understanding how values change. For exam- 
ple, we can change an integer by adding a natural to it. Abstractly, we have a 
set of values (the integers), and a set of changes (the naturals) which we can 
“apply” to a value (by addition) to get a new value. 

This kind of structure is well-known—it is a set action. It is also very natural 
to want to combine changes sequentially, and if we do this then we find ourselves 
with a monoid action. 

Using monoid actions for changes gives us a reason to think that change 
actions are an adequate representation of changes: any subset of A — A which 
is closed under composition can be represented as a monoid action on A, so we 
are able to capture all of these as change actions. 


2.1 Change Actions 


Definition 1. A change action is a tuple: 


A:= (A, AA, ®a) 
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where A is a set, AA is a monoid, and ®4: Ax AA — A is a monoid action 
on A.” 

We will call A the base set, and AA the change set of the change action. We 
will use - for the monoid operation of AA, and O for its identity element. When 
there is no risk of confusion, we will simply write ® for Ba. 


Examples. A typical example of a change action is (A*, A*, 4+) where A* is the 
set of finite words (or lists) of A. Here we represent changes to a word made by 
concatenating another word onto it. The changes themselves can be combined 
using + as the monoid operation with the empty word as the identity, and this 
is a monoid action: (a + b) + c = a + (b+ ©). 

This is a very common case: any monoid (A,-,0) can be seen as a change 
action (A, (A,-,0),-). Many practical change actions can be constructed in this 
way. In particular, for any change action (A, AA, 8), (AA, AA, -) is also a change 
action. This means that we do not have to do any extra work to talk about 
changes to changes—we can always take AAA = AA (although there may be 
other change actions available). 

Three examples of change actions are of particular interest to us. First, when- 
ever L is a Boolean algebra, we can give it the change actions (L, L,V) and 
(L, L, A), as well as a combination of these (see Sect. 3.2). Second, the natural 
numbers with addition have a change action Ñ := (N,N,+), which will prove 
useful during inductive proofs. 

Another interesting example of change actions is semiautomata. A semiau- 
tomaton is a triple (Q, X, T), where Q is a set of states, X is a (non-empty) finite 
input alphabet and T : Qx X — Q is a transition function. Every semiautomaton 
corresponds to a change action (Q, X*, T*) on the free monoid over X*, with T* 
being the free extension of T. Conversely, every change action A whose change 
set AA is freely generated by a finite set corresponds to a semiautomaton. 

Other recurring examples of change actions are: 


- Â; = (A, M, A(a, 6a).a), where M is any monoid, which we call the empty 
change action on any base set, since it induces no changes at all. 

- Âr = (A,A — A,ev), where A is an arbitrary set, A — A denotes the set 
of all functions from A into itself, considered as a monoid under composition 
and ev is the usual evaluation map. We will call this the “full” change action 
on A since it contains every possible non-redundant change. 


These are particularly relevant because they are, in a sense, the “smallest” and 
“largest” change actions that can be imposed on an arbitrary set A. 

Many other notions in computer science can be understood naturally in terms 
of change actions, e.g. databases and database updates, files and diffs, Git repos- 
itories and commits, even video compression algorithms that encode a frame as 
a series of changes to the previous frame. 


5 Why not just work with monoid actions? The reason is that while the category of 
monoid actions and the category of change actions have the same objects, they have 
different morphisms. See Sect. 8.1 for further discussion. 
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2.2 Derivatives 


When we do incremental computation we are usually trying to save ourselves 
some work. We have an expensive function f : A — B, which we’ve evaluated 
at some point a. Now we are interested in evaluating f after some change da to 
a, but ideally we want to avoid actually computing f(a 6a) directly. 

A solution to this problem is a function f’: A x AA — AB, which given a 
and ôa tells us how to change f(a) to f(a @ da). We call this a derivative of a 
function. 


Definition 2. Let A and B be change actions. A derivative of a function f : 
A —> B is a function f': Ax AA — AB such that 


f(a@®a 6a) = f(a) OB f'(a, ĉa) 


A function which has a derivative is differentiable, and we will write Â —> Ê for 
the set of differentiable functions between A and B.°® 


Derivatives need not be unique in general, so we will speak of “a” derivative. 
Functions into “thin” change actions—where a © ĝa = a © 6b implies ĝa = 6b— 
have unique derivatives, but many change actions are not thin. For example, 
(P(N), P(N), A) is not thin because {0} N {1} = {0} A {2}. 

Derivatives capture the structure of incremental computation, but there are 
important operational considerations that affect whether using them for compu- 
tation actually saves us any work. As we will see in a moment (Proposition 1), for 
many change actions we will have the option of picking the “worst” derivative, 
which merely computes f(a ® ĝa) directly and then works out the change that 
maps f(a) to this new value. While this is formally a derivative, using it cer- 
tainly does not save us any work! We will be concerned with both the possibility 
of constructing correct derivatives (Sects. 3.2 and 6.2 in particular), and also in 
giving ourselves a range of derivatives to choose from so that we can soundly 
optimize for operational value. 

For our Datalog case study, we aim to cash out the folkloric idea that incre- 
mental computation functions via a derivative. We will construct a derivative 
of the semantics of Datalog in stages: first the non-recursive formula semantics 
(Sect. 4); and later the full, recursive, semantics (Sect. 7). 


2.3 Useful Facts About Change Actions and Derivatives 


The Chain Rule. The derivative of a function can be computed composition- 
ally, because derivatives satisfy the standard chain rule. 


6 Note that we do not require that f’(a,déa-6b) = f’(a,da) - f'(a ®© ôa, ôb) nor that 
f'(a,0) = 0. These are natural conditions, and all the derivatives we have studied 
also satisfy them, but none of the results on this paper require them to hold. 
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Theorem 1 (The Chain Rule). Let f : A— Ê, g : B > C be differentiable 
functions. Then go f is also differentiable, with a derivative given by 


(go f) (z, 6x) = g' (f(a), f'(x, dx) 


or, in curried form 


(go FY (£) = g'(F(2)) 0 f'(@) 


Complete change actions and minus operators. Complete change actions 
are an important class of change actions, because they have changes between 
any two values in the base set. 


Definition 3. A change action is complete if for any a,b € A, there is a change 
da € AA such that a @ 6a = b. 


Complete change actions have convenient “minus operators” that allow us to 
compute the difference between two values. 


Definition 4. A minus operator is a function O : Ax A — AA such that 
a®(bGa)=b for all a,bE A. 


Proposition 1. Given a minus operator ©, and a function f, let 


fila, 6a) := f(a @ da) © f(a) 
Then fé is a derivative for f. 


Proposition 2. Let A bea change action. Then the following are equivalent: 


- Ais complete. P 
~ There is a minus operator on A. 
- For any change action B all functions f : B — A are differentiable. 


This last property is of the utmost importance, since we are often concerned 
with the differentiability of functions. 


Products and sums. Given change actions on sets A and B, the question 
immediately arises of whether there are change actions on their Cartesian prod- 
uct A x B or disjoint union A+ B. While there are many candidates, there is a 
clear “natural” choice for both. 


Proposition 3 (Products). Let A = (A, AA,®,) and B = (B,AB,@g) be 
change actions. 

Then Ax B= (A x B, AA x AB,®x) is a change action, where Bx is 
defined by: 


(a,b) Bax (6a, 5b) = (a Da da, b @p 5b) 
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The projection maps 77,72 are differentiable with respect to it. Furthermore, 
a function f : Ax B — C is differentiable from A x B into C if and only if, for 
every fixed a E€ A and b E€ B, the partially applied functions 
flay): BoC 


are differentiable. 


Whenever f : Ax B — C is differentiable, we will sometimes use 0, f and 02 f 
to refer to derivatives of the partially applied versions, i.e. if f} : Bx AB — AC 
and f;: Ax AA— AC refer to derivatives for f(a,-), f(-, 6) respectively, then 


Of: Ax AA x B— AC 
Of (a, da, b) := fh (a, ĉa) 
ðf: Ax Bx AB— AC 
02 f (a, b, 5b) = f(b, 5b) 


Proposition 4 (Disjoint unions). Let A = (A,AA,@,4) and Ê = 
(B, AB, gp) be change actions. 

Then A+ B := (A+ B,AA x AB,®,) is a change action, where B4} is 
defined as: 


11a 4 (da, 6b) := tı (a Gg ôa) 
Lob O+ (da, ôb) = to(b OB ôb) 


The injection maps t1, t2 are differentiable with respect to A+B. Furthermore, 
whenever C is a change action and f : A— C,g: B — C are differentiable, 
then so is |f, g]. 


2.4 Comparing Change Actions 


Much like topological spaces, we can compare change actions on the same base set 
according to coarseness. This is useful since differentiability of functions between 
change actions is characterized entirely by the coarseness of the actions. 


Definition 5. Let Ay and As be change actions on A. We say that Ay is coarser 
than Ag (or that A> is finer than Aı) whenever for every x E€ A and change 
da, E AA}, there is a change dag € AA» such that x Ga, 6a, = = £ PA, a2. 

We will write Âr < Ay whenever Ay is coarser than Â». If A, is both finer 
and coarser than As; we will say that A, and Âz are equivalent. 


The relation < defines a preorder (but not a partial order) on the set of all 
change actions over a fixed set A. Least and greatest elements do exist up to 
equivalence, and correspond respectively to the empty change action A, and any 
complete change action, such as the full change action Ay, defined in Sect. 2.1. 
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Proposition 5. Let As < A, By < Bo be change actions, and suppose the 
function f : A > B is differentiable as a function from A, into Bı. Then f is 
differentiable as a function from A» into Bo. 


A consequence of this fact is that whenever two change actions are equivalent 
they can be used interchangeably without affecting which functions are differen- 
tiable. One last parallel with topology is the following result, which establishes 
a simple criterion for when a change action is coarser than another: 


Proposition 6. Let A, Ag be change actions on A. Then A, is coarser than 
Â if and only if the identity function id: A — A is differentiable from A, to Ag. 


3 Posets and Boolean Algebras 


The semantic domain of Datalog is a complete Boolean algebra, and so our next 
step is to construct a good change action for Boolean algebras. Along the way, we 
will consider change actions over posets, which give us the ability to approximate 
derivatives, which will turn out to be very important in practice. 


3.1 Posets 


Ordered sets give us a constrained class of functions: monotone functions. We 
can define ordered change actions, which are those that are well-behaved with 
respect to the order on the underlying set.’ 


Definition 6. A change action A is ordered if 


- A and AA are posets. 
- ® is monotone as a map from Ax AA — A 
— - is monotone as a map from AA x AA — AA 


In fact, any change action whose base set is a poset induces a partial order 
on the corresponding change set: 


Definition 7. 6a <4 ôb iff for alla € A it is the case that a Ẹ da < a ® ôb. 


Proposition 7. Let A be a change action on a set A equipped with a partial 
order < such that © is monotone in its first argument. Then A is an ordered 
change action when AA is equipped with the partial order <4. 


In what follows, we will extend the partial order <4 on some change set 
AB pointwise to functions from some A into AB. This pointwise order interacts 
nicely with derivatives, in that it gives us the following lemma: 


T If we were giving a presentation that was generic in the base category, then this 
would simply be the definition of being a change action in the category of posets 
and monotone maps. 


534 M. Alvarez-Picallo et al. 


Theorem 2 (Sandwich lemma). Let A be a change action, and B be an 
ordered change action, and let f: A— B and g: Ax AA — AB be function. If 
fı and f, are derivatives for f such that 


fi <ag<a fi 
then g is a derivative for f. 


If unique minimal and maximal derivatives exist, then this gives us a char- 
acterisation of all the derivatives for a function. 


Theorem 3. Let A and B be change actions, with B ordered, and let f : A> B 
be a function. If there exist fı; and fy; which are unique minimal and maximal 
derivatives of f, respectively, then the derivatives of f are precisely the functions 
f! such that 

fu <a fl <a fr 


This theorem gives us the leeway that we need when trying to pick a deriva- 
tive: we can pick out the bounds, and that tells us how much “wiggle room” we 
have above and below. 


3.2 Boolean Algebras 


Complete Boolean algebras are a particularly nice domain for change actions 
because they have a negation operator. This is very helpful for computing dif- 
ferences, and indeed Boolean algebras have a complete change action. 


Proposition 8 (Boolean algebra change actions). Let L be a complete 
Boolean algebra. Define : 
Lye = (L,L D< L, ®pa) 


where 


L L:={(a,b)ELxL|a^nb=L1} 
a Boa (p,q) = (a V p) ^ =q 


(p,q): (7, 8) = ((p A =s) V r, (q A =r) V s) 


with identity element (L, L). 
Then Ly is a complete change action on L. 


We can think of 1,4 as tracking changes as pairs of “upwards” and “down- 
wards” changes, where the monoid action simply applies one after the other, with 
an adjustment to make sure that the components remain disjoint. For example, 


8 The intuition that Ê, is made up of an “upwards” and a “downwards” change action 
glued together can in fact be made precise, but the specifics are outside the scope 
of this paper. 
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in the powerset Boolean algebra P(N), a change to {1,2} might consist of adding 
{3} and removing {1}, producing {2,3}. In P(N), this would be represented as 
({1,2}) © ({3}, {1}) = {2,3}. 

Boolean algebras also have unique maximal and minimal derivatives, under 
the usual partial order based on implication. The change set is, as usual, given 
the change partial order, which in this case corresponds to the natural order on 
Lx LP, 


Proposition 9. Let L be a complete Boolean algebra with the ie change action, 
and f: A — L be a function. Then, the following are minus operators: 


a6, b= (a ^b, ~a) 
aOrt b= (a,b A~a) 


Additionally, f&, and f6- define unique least and greatest derivatives for f. 
Theorem 3 then gives us bounds for all the derivatives on Boolean algebras: 


Corollary 1. Let L be a complete Boolean algebra with the corresponding 
change action Ly4, A be an arbitrary change action, and f : A — L be a func- 
tion. Then the derivatives of f are precisely those functions f': Ax AA —> AA 
such that 


f Saf sak , 
This makes Theorem3 actually usable in practice, since we have concrete 
definitions for our bounds (which we will make use of in Sect. 4.2). 


4 Derivatives for Non-recursive Datalog 


We now want to apply the theory we have developed to the specific case of the 
semantics of Datalog. Giving a differentiable semantics for Datalog will lead us 
to a strategy for performing incremental evaluation and maintenance of Datalog 
programs. To begin with, we will restrict ourselves to the non-recursive fragment 
of the language—the formulae that make up the right hand sides of Datalog rules. 
We will tackle the full program semantics in a later section, once we know how 
to handle fixpoints. 

Although the techniques we are using should work for any language, Datalog 
provides a non-trivial case study where the need for incremental computation is 
real and pressing, as we saw in Sect. 1. 


4.1 Semantics of Datalog Formulae 


Datalog is usually given a logical semantics where formulae are interpreted as 
first-order logic predicates and the semantics of a program is the set of models of 
its constituent predicates. We will instead give a simple denotational semantics 
(as is typical when working with fixpoints, see e.g. [17]) that treats a Datalog 
formula as directly denoting a relation, i.e. a set of named tuples, with variables 
ranging over a finite schema. 
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Definition 8. A schema I is a finite set of names. A named tuple over I’ is 
an assignment of a value vi for each name x; in I. Given disjoint schemata 
I = {a1,...,Un} and X = {y1,...,Ym}, the selection function op is defined as 


or({@1 = V1,...,Zn > Vn, Yı W1,- +, Ym > Wm }) = {£1 H U1,...,2n Un} 


i.e. op restricts a named tuple over TU X into a tuple over I with the same 
values for the names in I. We denote the elementwise extension of or to sets 
of tuples also as ap. 


We will adopt the usual closed-world assumption to give a denotation to 
negation. 


Definition 9. For any schema I’, there exists a universal relation Ur. Negation 
on relations can then be defined as 


=R :=Ur\R 
This makes Relp, the set of all subsets of Ur, a complete Boolean algebra. 


Definition 10. A Datalog formula T whose free term variables are contained 
in I’ denotes a function from Rel}. to Relr. 


[Jr : Formula — Rel}, — Rel; 


IfR = (Ri,...,Rn) is a choice of a relation Ri for each of the variables Ri, 
[T](R) is inductively defined according to the rules in Fig. 1. 


[T]r(R) := Ur [T A U]r(R) = [T]r (R) 9 [U]r (R) 
[L]r(R) = 96 IT v Ullr (R) = [T]r (R) U [U]r (R) 
[Rilr (R) = R; [-T]r(R) = -[T]r(R) 


[S2.T] r(R) := or([T] rugs} (R) 


Fig. 1. Formula semantics for Datalog 


Since Relr is a complete Boolean algebra, and so is Rel}, [T] r is a function 
between complete Boolean algebras. For brevity, we will often leave the schema 
implicit, as it is clear from the context. 


4.2 Differentiability of Datalog Formula Semantics 


In order to actually perform our incremental computation, we first need to pro- 
vide a concrete derivative for the semantics of Datalog formulae. Of course, since 
[Tr is a function between the complete Boolean algebras Relf and Rely, and 
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A(L) := V(L) = 
A(T) := 4 V(t) HL 
A(R;) := AR; V(R;) = VR; 
A(T VU) = A(T) V A(U) V(T VU) = (V(T) A >X(U)) 
A(T AU) := (A(T) A X(U)) v (V(U) A >X(T)) 
v (A(U) A X(T)) V(T AU) = (V(T) AU) V (TA V()) 
A(-T) := V(T) V(AT) := A(T) 
A(sa.T) := Sa.A(T) V(3x.T) = 3x.V (T) A ~3x.X(T) 
X(R) := (RV A(R)) A-V (R) 


Fig. 2. Upwards and downwards formula derivatives for Datalog 


we know that the corresponding change actions Rell. and Rel, are complete, 
this guarantees the existence of a derivative for [T]. 

Unfortunately, this does not necessarily provide us with an efficient derivative 
for |T]. The derivatives that we know how to compute (Corollary 1) rely on 
computing f(a @ da) itself, which is the very thing we were trying to avoid 
computing! 

Of course, given a concrete definition of a derivative we can simplify this 
expression and hopefully make it easier to compute. But we also know from 
Corollary 1 that any function bounded by fé, and fé, is a valid derivative, 
and we can therefore optimize anywhere within that range to make a trade-off 
between ease of computation and precision.’ 

There is also the question of how to compute the derivative. Since the change 
set for Rel, is a subset of Rel x Rel, it is possible and indeed very natural 
to compute the two components via a pair of Datalog formulae, which allows 
us to reuse an existing Datalog formula evaluator. Indeed, if this process is 
occurring in an optimizing compiler, the derivative formulae can themselves be 
optimized. This is very beneficial in practice, since the initial formulae may be 
quite complex. 

This does give us additional constraints that the derivative formulae must 
satisfy: for example, we need to be able to evaluate them; and we may wish to 
pick formulae that will be easy or cheap for our evaluation engine to compute, 
even if they compute a less precise derivative. 

The upshot of these considerations is that the optimal choice of derivatives 
is likely to be quite dependent on the precise variant of Datalog being evaluated, 
and the specifics of the evaluation engine. Here is one possibility, which is the 
one used at Semmle. 


° The idea of using an approximation to the precise derivative, and a soundness con- 
dition, appears in Bancilhon [9]. 
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A concrete Datalog formula derivative. In Fig. 2, we define a “symbolic” 
derivative operator as a pair of mutually recursive functions, A and V, which 
turn a Datalog formula T into new formulae that compute the upwards and 
downwards parts of the derivative, respectively. Our definition uses an auxiliary 
function, X, which computes the “neXt” value of a term by applying the upwards 
and downwards derivatives. As is typical for a derivative, the new formulae will 
have additional free relation variables for the upwards and downwards deriva- 
tives of the free relation variables of T, denoted as AR and VR respectively. 
Evaluating the formula as a derivative means evaluating it as a normal Datalog 
formula with the new relation variables set to the input relation changes. 

While the definitions mostly exhibit the dualities we would expect between 
corresponding operators, there are a few asymmetries to explain. 

The asymmetry between the cases for A(T V U) and V(T AU) is for opera- 
tional reasons. The symmetrical version of A(T'VU) is (A(T) A7U) V(A(U)A7T) 
(which is also precise). The reason we omit the negated conjuncts is simply that 
they are costly to compute and not especially helpful to our evaluation engine. 

The asymmetry between the cases for J is because our dialect of Datalog 
does not have a primitive universal quantifier. If we did have one, the cases for 
J would be dual to the corresponding cases for V. 


Theorem 4 (Concrete Datalog formula derivatives). Let A, V, X 
Formula — Formula be mutually recursive functions defined by structural induc- 
tion as in Fig. 2. 

Then A(T) and V(T) are disjoint, and for any schema I and any Dat- 
alog formula T whose free term variables are contained in T, [T] p = 
((A()]r,[V(L)] r) is a derivative for [T]r. 


We can give a derivative for our treeP predicate by mechanically applying 
the recursive functions defined in Fig. 2. 


A(treeP(z)) 
= p(x) A Ay.(child(x, y) A A(treeP(y))) A aAy.(child(x, y) A aX(treeP(y))) 


V(treeP(2)) 
= p(x) A Ay.(child(x, y) A V(treeP(y))) 


The upwards difference in particular is not especially easy to compute. If we 
naively compute it, the third conjunct requires us to recompute the whole of the 
recursive part. However, the second conjunct gives us a guard: if it is empty we 
then the whole formula will be, so we only need to evaluate the third conjunct 
if the second conjunct is non-empty, i.e if there is some change in the body of 
the existential. 

This shows that our derivatives aren’t a panacea: it is simply hard to compute 
downwards differences for J (and, equivalently, upwards differences for Y) because 
we must check that there is no other way of deriving the same facts.'° However, 


10 The “support” data structures introduced by [25] are an attempt to avoid this issue 
by tracking the number of derivations of each tuple. 
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we can still avoid the re-evaluation in many cases, and the inefficiency is local 
to this subformula. 


4.3 Extensions to Datalog 


Our formulation of Datalog formula semantics and derivatives is generic and 
modular, so it is easy to extend the language with new formula constructs: all 
we need to do is add cases for A and V. 

In fact, because we are using a complete change action, we can always do 
this by using the maximal or minimal derivative. This justifies our claim that 
we can support arbitrary additional formula constructs: although the maximal 
and minimal derivatives are likely to be impractical, having them available as 
options means that we will never be completely stymied. 

This is important in practice: here is a real example from Semmle’s variant 
of Datalog. This includes a kind of aggregates which have well-defined recursive 
semantics. Aggregates have the form 


r = age(p)(vs | T | U) 


where agg refers to an aggregation function (such as “sum” or “min”), vs is a 
sequence of variables, p and r are variables, T is a formula possibly mentioning 
vs, and U is a formula possibly mentioning vs and p. The full details can been 
found in Moor and Baars [34], but for example this allows us to write 


height(n, h) ——~3c.(child(n,c)) Ah = 0 
VIk.(h' = max(p)(c | child(n,c) | height(c,p)) Ah = h' +1) 


which recursively computes the height of a node in a tree. 
Here is an upwards derivative for an aggregate formula: 


A(r = agg(p)(vs | T | U)) = ws.(T A AU) Ar = agg(p)(vs | T | U) 


While this isn’t a precise derivative, it is still substantially cheaper than re- 
evaluating the whole subformula, as the first conjunct acts as a guard, allowing 
us to skip the second conjunct when U has not changed. 


5 Changes on Functions 


So far we have defined change actions for the kinds of things that typically make 
up data, but we would also like to have change actions on functions. This would 
allow us to define derivatives for higher-order languages (where functions are 
first-class); and for semantic operators like fixpoint operators fix : (A — A) > A, 
which also operate on functions. 

Function spaces, however, differ from products and disjoint unions in that 
there is no obvious “best” change action on A — B. Therefore instead of trying 
to define a single choice of change action, we will instead pick out subsets of 
function spaces which have “well-behaved” change actions. 
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Definition 11 (Functional Change Action). Given change actions A and B 
and a set U C A —> B, a change action U = (U, AU, ®y) is functional whenever 
the evaluation map ev : U x A —> B is differentiable, that is to say, whenever 
there exists a function ev’ : (U x A) x (AU x AA) > AB such that: 


(f Du ôf)(a Da 6a) = f(a) Op ev'((f,a), (8f, 6a) 
We will write U C Â => B whenever U C A— B and Û is functional. 


There are two reasons why functional change actions are usually associated 
with a subset of U C A — B. Firstly, it allows us to restrict ourselves to spaces 
of monotone or continuous functions. But more importantly, functional change 
actions are necessarily made up of differentiable functions, and thus a functional 
change action may not exist for the entire function space A —> B. 


Proposition 10. Let UCASB bea functional change action. Then every 
f EU is differentiable, with a derivative f’ given by: 


f'(x, 6x) = ev ((f, x), (0, ôx)) 


5.1 Pointwise Functional Change Actions 


Even if we restrict ourselves to the differentiable functions between A and Ê it 
is hard to find a concrete functional change action for this set. Fortunately, in 
many important cases there is a simple change action on the set of differentiable 
functions. 


Definition 12 (Pointwise functional change action). Let A and B be 
change actions. The pointwise functional change action A =pt Ê, when it 
is defined, is given by (A =e Bas AB,®_,), with the monoid structure 
(A — AB,-_,,0_,) and the action ®_, defined by: 


(F B— 6f)(x) = f(a) Sp of (x) 
) 


That is, a change is given pointwise, mapping each point in the domain to a 
change in the codomain. 

The above definition is not always well-typed, since given f : A — Ê and 
of : A— AB there is no guarantee that f 6. df is differentiable. We present 
two sufficient criteria that guarantee this. 


Theorem 5. Let A and B be change actions, and suppose that B satisfies one 
of the following conditions: 


- Bisa complete change action. 
- The change action AB := (AB, AB,-g) is complete and ®p : B x AB —> B 
is differentiable. 
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Then the pointwise functional change action (A > Ê, A AB,®_,) is well 
defined." 


As a direct consequence of this, it follows that whenever L is a Boolean algebra 
(and hence has a complete change action), the pointwise functional change action 
A = pt Log is well-defined. 

Pointwise functional change actions are functional in the sense of Defini- 


tion 11. Moreover, the derivative of the evaluation map is quite easy to compute. 


Proposition 11 (Derivatives of the evaluation map). Let A and B be 
change actions such that the pointwise functional change action A = pt B is well 
defined, and let f : A> Ê, a € A, õa € AA, ôf € A > AB. 

Then the following are both derivatives of the evaluation map: 


evi ((f, a), (of, 6a)) = f'(a, da) ` ôf (a @ da) 
eva((f,a), (8f, da) = Sf (a) - (f © Sf) (a, õa) 


A functional change action merely tells us that a derivative of the evaluation 
map exists—a pointwise change action actually gives us a definition of it. In 
practice, this means that we will only be able to use the results in Sect. 6.2 
(incremental computation and derivatives of fixpoints) when we have pointwise 
change actions, or where we have some other way of computing a derivative of 
the evaluation map. 


6 Directed-Complete Partial Orders and Fixpoints 


Directed-complete partial orders (dcpos) equipped with a least element, are an 
important class of posets. They allow us to take fixpoints of (Scott-)continuous 
maps, which is important for interpreting recursion in program semantics. 


6.1 Dcpos 


As before, we can define change actions on depos, rather than sets, as change 
actions whose base and change sets are endowed with a dcpo structure, and 
where the monoid operation and action are (Scott-)continuous. 


Definition 13. A change action A is continuous if 


— A and AA are dcpos. 
— © is Scott-continuous as a map from Ax AA —> A. 
— - is Scott-continuous as a map from AA x AA —> AA. 


11 Either of these conditions is enough to guarantee that the pointwise functional 
change action is well defined, but it can be the case that B satisfies neither and 
yet pointwise change actions into B do exist. A precise account of when pointwise 
functional change actions exist is outside the scope of this paper. 
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Unlike posets, the change order <, does not, in general, induce a dcpo on 
AA. As a counterexample, consider the change action (N, N, +), where N denotes 
the depo of natural numbers extended with positive infinity. 

A key example of a continuous change action is the L,, change action on 
Boolean algebras. 


Proposition 12 (Boolean algebra continuity). Let L be a Boolean algebra. 
Then Ly is a continuous change action. 


For a general overview of results in domain theory and dcpos, we refer the 
reader to an introductory work such as [2], but we state here some specific results 
that we shall be using, such as the following, whose proof can be found in [2, 
Lemma 3.2.6]: 


Proposition 13. A function f : Ax B — C is continuous iff it is continuous 
in each variable separately. 


It is a well-known result in standard calculus that the limit of an absolutely 
convergent sequence of differentiable functions {fi} is itself differentiable, and 
its derivative is equal to the limit of the derivatives of the f;. A consequence of 
Proposition 13 is the following analogous result: 


Corollary 2. Let A and Ê be change actions, with Ê continuous and let { fi} 
and {fi} be I-indexed directed sets of functions in A— B and Ax AA — AB 
respectively. 

Then, if for every i € I it is the case that f; is a derivative of fi, then ,e7 fi 
is a derivative of Ue, fi- 


6.2 Fixpoints 


Fixpoints appear frequently in the semantics of languages with recursion. If we 
can give a generic account of how to compute fixpoints using change actions, 
then this gives us a compositional way of extending a derivative for the non- 
recursive semantics of a language to a derivative that can also handle recursion. 
We will later apply this technique to create a derivative for the semantics of full 
recursive Datalog (Sect. 7.2). 


Iteration functions. Over directed-complete partial orders we can define a 
least fixpoint operator lfp in terms of the iteration function iter: 

iter: (A> A) xN> A 

iter(f,0) := L 

iter(f, n) := f"(1) 

lfp: (A> A) >A 


lfp( f) := | | iter(f, 7) (where f is continuous) 
neN 
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The iteration function is the basis for all the results in this section: we can 
take a partial derivative with respect to n, and this will give us a way to get 
to the next iteration incrementally; and we can take the partial derivative with 
respect to f, and this will give us a way to get from iterating f to iterating 


f Oof. 


Incremental computation of fixpoints. The following theorems provide a 
generalization of semi-naive evaluation to any differentiable function over a con- 
tinuous change action. Throughout this section we will assume that we have a 
continuous change action A, and any reference to the change action N will refer 
to the monoidal change action on the naturals defined in Sect. 2.1. 

Since we are trying to incrementalize the iterative step, we start by taking 
the partial derivative of iter with respect to n. 


Proposition 14 (Derivative of the iteration map with respect to n). Let 
A be a complete change action and let f : A > A be a differentiable function. 
Then iter is differentiable with respect to its second argument, and a partial 
derivative is given by: 

iter : (A — A) x Nx AN — AA 

Oviter(f,0,m) = iter(f,m) © iter(f,0) 

Ogiter(f,n+1,m) = f'(iter(f,n), dziter(f,n,m)) 


By using the following recurrence relation, we can then compute iter along 
with iter simultaneously: 
recur;: A x AA—> Ax AA 


recur (1, L) := (L, f(L) © L) 
recur (a, ôa) := (a @ da, f'(a, ôa)) 


Which has the property that 
recur; (L, L) = (iter(f, n), d2iter(f,n, 1)) 


This gives us a way to compute a fixpoint incrementally, by adding succes- 
sive changes to an accumulator until we reach it. This is exactly how semi-naive 
evaluation works: you compute the delta relation and the accumulator simulta- 
neously, adding the delta into the accumulator at each stage until it becomes 
the final output. 


Theorem 6 (Incremental computation of least fixpoints). Let A bea 
complete, continuous change action, f : A— A be continuous and differentiable. 
Then lfp(f) = Unen(™ (recur? (1,1))).” 


12 Note that we have not taken the fixpoint of recur f, Since it is not continuous. 
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Derivatives of fixpoints. In the previous section we have shown how to use 
derivatives to compute fixpoints more efficiently, but we also want to take the 
derivative of the fixpoint operator itself. A typical use case for this is where we 
have calculated some fixpoint 


Fp := fix(\X.F(E, X)) 


then update the parameter E with some change 6F and wish to compute the 
new value of the fixpoint, i.e. 


Frese = fix(\X.F(E @ ôE, X)) 


This can be seen as applying a change to the function whose fixpoint we are 
taking. We go from computing the fixpoint of F(£,_) to computing the fixpoint 
of F(E @ ôE, -). If we have a pointwise functional change action then we can 
express this change as a function giving the change at each point, that is: 


\X.F(E @ ôE, X) © F(E, X) 


In Datalog this would allow us to update a recursively defined relation given 
an update to one of its non-recursive dependencies, or the extensional database. 
For example, we might want to take the transitive closure relation and update 
it by changing the edge relation e. 

However, to compute these examples would requires us to provide a derivative 
for the fixpoint operator fix: we want to know how the resulting fixpoint changes 
given a change to its input function. 


Definition 14 (Derivatives of fixpoints). Let Â be a change action, let 
Û C A= A be a functional change action (not necessarily pointwise) and 
suppose fixy and fixaa are fixpoint operators for endofunctions on U and AA 
respectively. 

Then we define 


adjust : U x AU — (AA — AA) 


adjust(f, df) = A da.ev'((f, fixu(f)), (6f, da) 
fixy : U x AU — AA 


fixy (f, ôf) := fixaa (adjust (f, 6 f)) 


The suggestively named fixi; will in fact turn out to be a derivative—for 
least fixpoints. The appearance of ev’, a derivative of the evaluation map, in 
the definition of adjust is also no coincidence: as evaluating a fixpoint consists 
of many steps of applying the evaluation map, so computing the derivative of 
a fixpoint consists of many steps of applying the derivative of the evaluation 
map. 


13 Perhaps surprisingly, the authors first discovered an expanded version of this formula, 
and it was only later that we realised the remarkable connection to ev’. 
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Since lfp is characterized as the limit of a chain of functions, Corollary 2 
suggests a way to compute its derivative. It suffices to find a derivative iter}, of 
each iteration map such that the resulting set {iter’, | n € N} is directed, which 
will entail that | | „eyn iter’, is a derivative of lfp. 

These correspond to the first partial derivative of iter—this time with respect 
to f. While we are differentiating with respect to f, we are still going to need 
to define our derivatives inductively in terms of n. 


Proposition 15 (Derivative of the iteration map with respect to f). 
iter is differentiable with respect to its first argument and a derivative is given by: 


iter : (A —> A) x A(A— A) x N > AA 
Oyiter(f,df,0) = LAA 
driter(f,df,n +1) = ev'((f,iter(f,n)), (5f, Oriter(f, 6f,n))) 


As before, we can now compute O,iter together with iter by mutual 
recursion. 14 


recurs sf: Ax AA — Ax AA 
recur 7,5 7(a, ða) = (f(a),ev'((f, a), (6f, 6a))) 


Which has the property that 
recur} 5 (1,1) = (iter(f,n), O1iter(f, 6f,n)). 


This indeed provides us with a function whose limit we can take. If we do so 
we will discover that it is exactly lfp’ (defined as in Definition 14), showing that 
lfp’ is a true derivative. 


Theorem 7 (Derivatives of least fixpoint operators). Let 


- A be a continuous change action 

— U be the set of continuous functions f : A — A, with a functional change 
action U Cc Â= Â 

- f E U be a continuous, differentiable function 

- ôf € AU be a function change 

- ev’ be a derivative of the evaluation map which is continuous with respect to 
a and da. 


Then lfp’ is a derivative of lfp. 


Computing this derivative still requires computing a fixpoint—over the 
change lattice—but this may still be significantly less expensive than recom- 
puting the full new fixpoint. 


14 Tn fact, the recursion here is not mutual: the first component does not depend on 
the second. However, writing it in this way makes it amenable to computation by 
fixpoint, and we will in fact be able to avoid the recomputation of iter, when we 
show that it is equivalent to lfp’. 
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7 Derivatives for Recursive Datalog 


Given the non-recursive semantics for a language, we can extend it to handle 
recursive definitions using fixpoints. Section 6.2 lets us extend our derivative for 
the non-recursive semantics to a derivative for the recursive semantics, as well 
as letting us compute the fixpoints themselves incrementally. 

Again, we will demonstrate the technique with Datalog, although the app- 
roach is generic. 


7.1 Semantics of Datalog Programs 


First of all, we define the usual “immediate consequence operator” which com- 
, q P 
putes “one step” of our program semantics. 


Definition 15. Given a program P = (P\,...,P,), where P; is a predicate, with 
schema T;, the immediate consequence operator Z : Rel” — Rel” is defined as 
follows: 


T(R1,.--, Ro) = (Pid, (Ris -< -;, Ro), [Pair (Ri,---,Rn)) 


That is, given a value for the program, we pass in all the relations to the 
denotation of each predicate, to get a new tuple of relations. 


Definition 16. The semantics of a program P is defined to be 
[P] := fPrer (Z) 
and may be calculated by iterative application of T to L until fixpoint is reached. 


Whether or not this program semantics exists will depend on whether the 
fixpoint exists. Typically this is ensured by constraining the program such that 7 
is monotone (or, in the context of a dcpo, continuous). We do not require mono- 
tonicity to apply Theorem 6 (and hence we can incrementally compute fixpoints 
that happen to exist even though the generating function is not monotonic), but 
it is required to apply Theorem 7. 


7.2 Incremental Evaluation of Datalog 


We can easily extend a derivative for the formula semantics to a derivative for the 
immediate consequence operator Z. Putting this together with the results from 
Sect. 6.2, we have now created modular proofs for the two main results, which 
allows us to preserve them in the face of changes to the underlying language. 


Corollary 3. Datalog program semantics can be evaluated incrementally. 


Corollary 4. Datalog program semantics can be incrementally maintained with 
changes to relations. 
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Note that our approach makes no particular distinction between changes 
to the extensional relations (adding or removing facts), and changes to the 
intensional relations (changing the definition). The latter simply amounts to 
a change to the denotation of that relation, which can be incrementally propa- 
gated in exactly the same way as we would propagate a change to the extensional 
relations. 


8 Related Work 


8.1 Change Actions and Incremental Computation 


Change structures. The seminal paper in this area is Cai et al. [14]. We deviate 
from that excellent paper in three regards: the inclusion of minus operators, the 
nature of function changes, and the use of dependent types. 

We have omitted minus operators from our definition because there are many 
interesting change actions that are not complete and so cannot have a minus 
operator. Where we can find a change structure with a minus operator, often 
we are forced to use unwieldy representations for change sets, and Cai et al. 
cite this as their reason for using a dependent type of changes. For example, 
the monoidal change actions on sets and lists are clearly useful for incremental 
computation on streams, yet they do not admit minus operators—instead, one 
would be forced to work with e.g. multisets admitting negative arities, as Cai 
et al. do. 

Our function changes (when well behaved) correspond to what Cai et al. 
call pointwise differences (see [14, section 2.2]). As they point out, you can 
reconstruct their function changes from pointwise changes and derivatives, so 
the two formulations are equivalent. 

The equivalence of our presentations means that our work should be compati- 
ble with their Incremental Lambda Calculus (see [14, section 3]). The derivatives 
we give in Sect. 4.2 are more or less a “change semantics” for Datalog (see [14, 
section 3.5]). 


S-acts. S-acts (i.e the category of monoid actions on sets) and their categorical 
structure have received a fair amount of attention over the years (Kilp, Knauer, 
and Mikhalev [30] is a good overview). However, there is a key difference between 
change actions considered as a category (CAct) and the category of S-acts 
(SAct): the objects of SAct all maintain the same monoid structure, whereas 
we are interested in changing both the base set and the structure of the action. 


Derivatives of fixpoints. Arntzenius [5] gives a derivative operator for fix- 
points based on the framework in Cai et al. [14]. However, since we have different 
notions of function changes, the result is inapplicable as stated. In addition, we 
require a somewhat different set of conditions; in particular, we do not require 
our changes to always be increasing. 
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8.2 Datalog 


Incremental evaluation. The earliest interpretation of semi-naive evaluation 
as a derivative appears in Bancilhon [8]. The idea of using an approximate deriva- 
tive and the requisite soundness condition appears as a throwaway comment in 
Bancilhon and Ramakrishnan (9, section 3.2.2], and it would appear that nobody 
has since developed that approach. 

As far as we know, traditional semi-naive is the state of the art in incremental, 
bottom-up, Datalog evaluation, and there are no strategies that accommodate 
additional language features such as parity-stratified negation and aggregates. 


Incremental maintenance. There is existing literature on incremental main- 
tenance of relational algebra expressions. 

Griffin, Libkin, and Trickey [24] following Qian and Wiederhold [35] compute 
differences with both an “upwards” and a “downwards” component, and produce 
a set of rules that look quite similar to those we derive in Theorem 4. However, 
our presentation is significantly more generic, handles recursive expressions, and 
works on set semantics rather than bag semantics. 1” 

Several approaches [25,27]—most notably DReD—remove facts until one can 
start applying the rules again to reach the new fixpoint. Given a good way of 
deciding what facts to remove this can be quite efficient. However, such tech- 
niques tend to be tightly coupled to the domain. Although we know of no theo- 
retical reason why either approach should give superior performance when both 
are applicable, an empirical investigation of this could prove interesting. 

Other approaches [19,43] consider only restricted subsets of Datalog, or incur 
other substantial constraints. 


Embedding Datalog. Datafun (Arntzenius and Krishnaswami [6]) is a func- 
tional programming language that embeds Datalog, allowing significant improve- 
ments in genericity, such as the use of higher-order functions. Since we have 
directly defined a change action and derivative operator for Datalog, our work 
could be used as a “plugin” in the sense of Cai et al., allowing Datafun to com- 
pute its internal fixpoints incrementally, but also allowing Datafun expressions 
to be fully incrementally maintained. 

In a different direction, Cathcart Burn, Ong, and Ramsay [15] have proposed 
higher-order constrained Horn clauses (HoCHC), a new class of constraints for 
the automatic verification of higher-order programs. HoCHC may be viewed as 
a higher-order extension of Datalog. Change actions can be readily applied to 
organise an efficient semi-naive method for solving HoCHC systems. 


8.3 Differential A-calculus 


Another setting where derivatives of arbitrary higher-order programs have been 
studied is the differential -calculus [20,21]. This is a higher-order, simply-typed 


15 The same approach of finding derivatives would work with bag semantics, although 
unfortunately the Boolean algebra structure is missing. 


Fixing Incremental Computation 549 


A-calculus which allows for computing the derivative of a function, in a similar 
way to the notion of derivative in Cai’s work and the present paper. 

While there are clear similarities between the two systems, the most impor- 
tant difference is the properties of the derivatives themselves: in the differential 
A-calculus, derivatives are guaranteed to be linear in their second argument, 
whereas in our approach derivatives do not have this restriction but are instead 
required to satisfy a strong relation to the function that is being differentiated 
(see Definition 2). 

Families of denotational models for the differential \-calculus have been stud- 
ied in depth [12,13, 16, 29], and the relationship between these and change actions 
is the subject of ongoing work. 


8.4 Higher-Order Automatic Differentiation 


Automatic differentiation [23] is a technique that allows for efficiently computing 
the derivative of arbitrary programs, with applications in probabilistic modeling 
[31] and machine learning [10] among other areas. In recent times, this tech- 
nique has been successfully applied to higher-order languages [11,41]. While 
some approaches have been suggested [28,33], a general theoretical framework 
for this technique is still a matter of open research. 

To this purpose, some authors have proposed the incremental A-calculus as 
a foundational framework on which models of automatic differentiation can be 
based [28]. We believe our change actions are better suited to this purpose than 
the incremental A-calculus, since one can easily give them a synthetic differential 
geometric reading (by interpreting A as an Euclidean module and AA as its 
corresponding spectrum, for example). 


9 Conclusions and Future Work 


We have presented change actions and their properties, and used them to provide 
novel, compositional, strategies for incrementally evaluating and maintaining 
recursive functions, in particular the semantics of Datalog. 

The main avenue for future theoretical work is the categorical structure of 
change actions. This has begun to be explored by the authors in [4], where change 
actions are generalized to arbitrary Cartesian base categories and a construction 
is provided to obtain “canonical” Cartesian closed categories of change actions 
and differentiable maps. 

We hope that these generalizations would allow us to extend the theory of 
change actions towards other classes of models, such as synthetic differential 
geometry and domain theory. Some early results in [4] also indicate a connection 
between 2-categories and change actions which has yet to be fully mapped. 

The compositional nature of these techniques suggest that an approach like 
that used in [22] could be used for an even more generic approach to automatic 
differentiation. 

In addition, there is plenty of scope for practical application of the techniques 
given here to languages other than Datalog. 
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Abstract. Incremental computation requires propagating changes and 
reusing intermediate results of base computations. Derivatives, as pro- 
duced by static differentiation [7], propagate changes but do not reuse 
intermediate results, leading to wasteful recomputation. As a solution, 
we introduce conversion to Cache-Transfer-Style, an additional program 
transformations producing purely incremental functional programs that 
create and maintain nested tuples of intermediate results. To prove CTS 
conversion correct, we extend the correctness proof of static differentia- 
tion from STLC to untyped A-calculus via step-indezed logical relations, 
and prove sound the additional transformation via simulation theorems. 

To show ILC-based languages can improve performance relative to 
from-scratch recomputation, and that CTS conversion can extend its 
applicability, we perform an initial performance case study. We provide 
derivatives of primitives for operations on collections and incrementalize 
selected example programs using those primitives, confirming expected 
asymptotic speedups. 


1 Introduction 


After computing a base output from some base input, we often need to pro- 
duce updated outputs corresponding to updated inputs. Instead of rerunning 
the same base program on the updated input, incremental computation trans- 
forms the input change to an output change, potentially reducing asymptotic 
time complexity and significantly improving efficiency, especially for computa- 
tions running on large data sets. 

Incremental -Calculus (ILC) [7] is a recent framework for higher-order incre- 
mental computation. ILC represents changes from a base value vı to an updated 
value vz as a first-class change value dv. Since functions are first-class values, 
change values include function changes. 

ILC also statically transforms base programs to incremental programs or 
derivatives, that are functions mapping input changes to output changes. Incre- 
mental language designers can then provide their language with (higher-order) 
primitives (with their derivatives) that efficiently encapsulate incrementalizable 
© The Author(s) 2019 
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computation skeletons (such as tree-shaped folds), and ILC will incrementalize 
higher-order programs written in terms of these primitives. 

Alas, ILC only incrementalizes efficiently self-maintainable computations [7, 
Sect. 4.3], that is, computations whose output changes can be computed using 
only input changes, but not the inputs themselves [11]. Few computations are self- 
maintainable: for instance, mapping self-maintainable functions on a sequence is 
self-maintainable, but dividing numbers is not! We elaborate on this problem in 
Sect. 2.1. In this paper, we extend ILC to non-self-maintainable computations. 
To this end, we must enable derivatives to reuse intermediate results created by 
the base computation. 

Many incrementalization approaches remember intermediate results through 
dynamic memoization: they typically use hashtables to memoize function results, 
or dynamic dependence graphs [1] to remember a computation trace. However, 
looking up intermediate results in such dynamic data structure has a runtime 
cost that is hard to optimize; and reasoning on dynamic dependence graphs and 
computation traces is often complex. Instead, ILC produces purely functional 
programs, suitable for further optimizations and equational reasoning. 

To that end, we replace dynamic memoization with static memoization: fol- 
lowing Liu and Teitelbaum [20], we transform programs to cache-transfer style 
(CTS). A CTS function outputs their primary result along with caches of inter- 
mediate results. These caches are just nested tuples whose structure is derived 
from code, and accessing them does not involve looking up keys depending on 
inputs. Instead, intermediate results can be fetched from these tuples using stat- 
ically known locations. To integrate CTS with ILC, we extend differentiation to 
produce CTS derivatives: these can extract from caches any intermediate results 
they need, and produce updated caches for the next computation step. 

The correctness proof of static differentiation in CTS is challenging. First, we 
must show a forward simulation relation between two triples of reduction traces 
(the first triple being made of the source base evaluation, the source updated eval- 
uation and the source derivative evaluation; the second triple being made of the 
corresponding CTS-translated evaluations). Dealing with six distinct evaluation 
environments at the same time was error prone on paper and for this reason, 
we conducted the proof using Coq [26]. Second, the simulation relation must 
not only track values but also caches, which are only partially updated while in 
the middle of the evaluation of derivatives. Finally, we study the translation for 
an untyped A-calculus, while previous ILC correctness proofs were restricted to 
simply-typed A-calculus. Hence, we define which changes are valid via a logical 
relation and show its fundamental property. Being in an untyped setting, our 
logical relation is not indexed by types, but step-indered. We study an untyped 
language, but our work also applies to the erasure of typed languages. Formal- 
izing a type-preserving translation is left for future work because giving a type 
to CTS programs is challenging, as we shall explain. 

In addition to the correctness proof, we present preliminary experimental 
results from three case studies. We obtain efficient incremental programs even 
on non self-maintainable functions. 
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We present our contributions as follows. First, we summarize ILC and illus- 
trate the need to extend it to remember intermediate results via CTS (Sect. 2). 
Second, in our mechanized formalization (Sect. 3), we give a novel proof of cor- 
rectness for ILC differentiation for untyped A-calculus, based on step-indexed 
logical relations (Sect. 3.4). Third, building on top of ILC differentiation, we 
show how to transform untyped higher-order programs to CTS (Sect. 3.5) and 
we show that CTS functions and derivatives simulate correctly their non-CTS 
counterparts (Sect. 3.7). Finally, in our case studies (Sect. 4), we compare the 
performance of the generated code to the base programs. Section 4.4 discusses 
limitations and future work. Section 5 discusses related work and Sect. 6 con- 
cludes. Our mechanized proof in Coq, the case study material, and the extended 
version of this paper with appendixes are available online at https: //github.com/ 
yurug/cts. 


2 ILC and CTS Primer 


In this section we exemplify ILC by applying it on an average function, show 
why the resulting incremental program is asymptotically inefficient, and use CTS 
conversion and differentiation to incrementalize our example efficiently and speed 
it up asymptotically (as confirmed by benchmarks in Sect. 4.1). Further examples 
in Sect. 4 apply CTS to higher-order programs and suggest that CTS enables 
incrementalizing efficiently some core database primitives such as joins. 


2.1 Incrementalizing average via ILC 


Our example computes the average of a bag of numbers. After computing the 
base output yı of the average function on the base input bag xs, we want to 
update the output in response to a stream of updates to the input bag. Here 
and throughout the paper, we contrast base vs updated inputs, outputs, values, 
computations, and so on. For simplicity, we assume we have two updated inputs 
x82 and zsz and want to compute two updated outputs y2 and y3. We express 
this program in Haskell as follows: 


average : BagZ—Z 

average xs = let s = sum xs;n = length zs; r = divs ninr 

average; = let yı = average x81; y2 = average X82; y3 = average x83 
in (y1, Y2, Y3) 


To compute the updated outputs y2 and y3 in average faster, we try using 
ILC. For that, we assume that we receive not only updated inputs zs2 and zs3 
but also input change dxsı from zsı to zsə and input change dzs from zs2 to T83. 
A change dz from zı to x2 describes the changes from base value zı to updated 
value 22, so that z2 can be computed via the update operator ® as x, ® dx. A 
nil change 0, is a change from base value x to updated value z itself. 
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ILC differentiation automatically transforms the average function to its 
derivative daverage :: Bag Z — A(Bag Z) — AZ. A derivative maps input 


changes to output changes: here, dy; = daverage rs, dts, is a change from 
base output yı = average xs; to updated output y2 = average x82, hence 
y2 = yı © dy. 


Thanks to daverage’s correctness, we can rewrite averages to avoid expensive 
calls to average on updated inputs and use daverage instead: 


incrementalAverage, :: (Z, Z, Z) 
incrementalAverage; = 
let yı = average zsı; dy, = daverage xs; dzsı 
y2 = yı ® dy; dyo = daverage xs2 dxs2 
ys = y2 © dyz 
in (y1, y2, Y3) 


In general, also the value of a function f : A — B can change from a base 
value fı to an updated value f2, mainly when f is a closure over changing data. 
In that case, the change from base output fı zı to updated output fo x2 is given 
by df x, dr, where df :: A — AA — AB is now a function change from fi 
to f2. Above, average exemplifies the special case where fı = f2 = f: then the 
function change df is a nil change, and df xı dz is a change from fı tı = f tı 
and f2 £2 = f x2. That is, a nil function change for f is a derivative of f. 


2.2 Self-maintainability and Efficiency of Derivatives 


Alas, derivatives are efficient only if they are self-maintainable, and daverage is 
not, so incrementalAverage3 is no faster than averages! Consider the result of 
differentiating average: 


daverage :: Bag Z > A(Bag Z) > AZ 
daverage xs drs = let s = sum xs; ds = dsum zs dzs; 
n = length xs; dn = dlength xs dzs; 
r = div s n; dr = ddiv s ds n dn 
in dr 


Just like average combines sum, length, and div, its derivative daverage combines 
those functions and their derivatives. daverage recomputes base intermediate 
results s, n and r exactly as done in average, because they might be needed as 
base inputs of derivatives. Since r is unused, its recomputation can be dropped 
during later optimizations, but expensive intermediate results s and n are used 


by ddiv: 


ddiv :: Z > AZ > Z — AZ — AZ 
ddiv a da b db = div (a @ da) (b @ db) — diva b 
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Function ddiv computes the difference between the updated and the original 
result, so it needs its base inputs a and b. Hence, daverage must recompute s 
and n and will be slower than average! 

Typically, ILC derivatives are only efficient if they are self-maintainable: a 
self-maintainable derivative does not inspect its base inputs, but only its change 
inputs, so recomputation of its base inputs can be elided. Cai et al. |7] leave 
efficient support for non-self-maintainable derivatives for future work. 

But this problem is fixable: executing daverage xs dzs will compute exactly 
the same s and n as executing average xs, so to avoid recomputation we must 
simply save s and n and reuse them. Hence, we CTS-convert each function f 
to a CTS function fC and a CTS derivative dfC: CTS function fC produces, 
together with its final result, a cache containing intermediate results, that the 
caller must pass to CTS derivative dfC. 

CTS-converting our example produces the following code, which requires no 
wasteful recomputation. 

type AverageC = (Z, SumC, Z, LengthC, Z, DivC) 

averageC :: Bag Z — (Z, AverageC) 

averageC' ts = 

let (s, cs1) = sumC us; (n, cn1) = lengthC xs; (r, cri) = divC sn 
in (r,(s,cs1, n, cni, r, cr1)) 

daverageC :: Bag Z > A(Bag Z) > AverageC — (AZ, AverageC) 

daverageC xs dzs (s, cs1, n, CNi, T, Cri) = 
let (ds, cs2) = dsumC zs dzs csı 
(dn, cn2) = dlengthC xs dzs cn1 
(dr, cr2) = ddivC s ds n dn cri 
in (dr, ((s ® ds), cs2, (n ® dn), cna, (r ® dr), cr2)) 


For each function f, we introduce a type FC for its cache, such that a CTS 
function fC has type A — (B,FC) and CTS derivative dfC has type A —> 
AA — FC — (AB, FC). Crucially, CTS derivatives like daverageC must return 
an updated cache to ensure correct incrementalization, so that application of 
further changes works correctly. In general, if (y1,c1) = fC a and (dy, co) = 
dfC x, dx c1, then (yı ® dy, c2) must equal the result of the base function fC 
applied to the updated input zı © dz, that is (y1 ® dy, c2) = fC (a ® dz). 

For CTS-converted functions, the cache type FC is a tuple of intermedi- 
ate results and caches of subcalls. For primitive functions like div, the cache 
type DivC could contain information needed for efficient computation of output 
changes. In the case of div, no additional information is needed. The definition of 
divC uses div and produces an empty cache, and the definition of ddivC follows 
the earlier definition for ddiv, except that we now pass along an empty cache. 


data DivC = DivC 

divC : Z > Z = (Z, DivC) 

divC a b = (div a b, DivC) 

ddivC = Z > AZ > Z > AZ > DivC = (AZ, DivC) 

ddivC a da b db DivC = (div (a @ da) (b db) — div a b, DivC) 
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Finally, we can rewrite average; to incrementally compute y2 and ys: 


ctsIncrementalAverages :: (Z, Z, Z) 
ctsIncrementalAverage; = 
let (y1, c1) = averageC 181; (dy,,c2) = daverageC xs, drs, cı 
yo = yı ® dy; (dyz, c3) = daverageC xs2 drs2 c2 
ys = y2 ® dyz 
in (y1, Y2, ys) 


Since functions of the same type translate to CTS functions of different types, 
in a higher-order language CTS translation is not always type-preserving; how- 
ever, this is not a problem for our case studies (Sect. 4); Sect. 4.1 shows how to 
map such functions, and we return to this problem in Sect. 4.4. 


3 Formalization 


We now formalize CTS-differentiation for an untyped Turing-complete A- 
calculus, and formally prove it sound with respect to differentiation. We also 
give a novel proof of correctness for differentiation itself, since we cannot sim- 
ply adapt Cai et al. |7]’s proof to the new syntax: Our language is untyped 
and Turing-complete, while Cai et al. [7]’s proof assumed a strongly normalizing 
simply-typed A-calculus and relied on its naive set-theoretic denotational seman- 
tics. Our entire formalization is mechanized using Coq [26]. For reasons of space, 
some details are deferred to the appendix. 


Terms Closed values 

at ::= let ap = ar in ar Let av ::= ap|AGp. at] Closure 
aT Tuple (@) Tuple 

fT Application £ Literal 
Nested tuples P Primitive 

ar =r Variable Op Nil change for primitive 
z@ dt Update lay Replacement change 

(ar) Tuple Value environments 
Patterns UREA Empty 

ap =T Variable QE; T = Ay Value binding 
(ap) Tuple j,k,ın € N Step indexes 


Fig. 1. Our language Àz of lambda-lifted programs. Tuples can be nullary. 


Transformations. We introduce and prove sound three term transformations, 
namely differentiation, CTS translation and CTS differentiation, that take a 
function to its corresponding (non-CTS) derivative, CTS function and CTS 
derivative. Each CTS function produces a base output and a cache from a base 
input, while each CTS derivative produces an output change and an updated 
cache from an input, an input change and a base cache. 
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Proof technique. To show soundness, we prove that CTS functions and deriva- 
tives simulate respectively non-CTS functions and derivatives. In turn, we for- 
malize (non-CTS) differentiation as well, and we prove differentiation sound with 
respect to non-incremental evaluation. Overall, this shows that CTS functions 
and derivatives are sound relatively to non-incremental evaluation. Our presenta- 
tion proceeds in the converse order: first, we present differentiation, formulated 
as a variant of Cai et al. [7] s definition; then, we study CTS differentiation. 

By using logical relations, we simplify significantly the setup of Cai et al. [7]. 
To handle an untyped language, we employ step-indexed logical relations. 
Besides, we conduct our development with big-step operational semantics 
because that choice simplifies the correctness proof for CTS conversion. Using 
big-step semantics for a Turing complete language restricts us to terminating 
computations. But that is not a problem: to show incrementalization is correct, 
we need only consider computations that terminate on both old and new inputs, 
following Acar et al. [3] (compared with in Sect. 5). 


Structure of the formalization. Section 3.1 introduces the syntax of the language 
Az we consider in this development, and introduces its four sublanguages AAL, 
AIAL; AcAt and Arcay. Section 3.2 presents the syntax and the semantics of 
AAL, the source language for our transformations. Section 3.3 defines differenti- 
ation and its target language Azaz, and Sect. 3.4 proves differentiation correct. 
Section 3.5 defines CTS conversion, comprising CTS translation and CTS differ- 
entiation, and their target languages Acar and Arca. Section 3.6 presents the 
semantics of Acar. Finally, Sect. 3.7 proves CTS conversion correct. 


Notations. We write X for a sequence of X of some unspecified length 
Xy,.--,Xm- 


3.1 Syntax for Az 


A superlanguage. To simplify our transformations, we require input programs to 
have been lambda-lifted [15] and converted to A’-normal form (A’NF). Lambda- 
lifted programs are convenient because they allow us to avoid a specific treatment 
for free variables in transformations. A’NF is a minor variant of ANF [24], where 
every result is bound to a variable before use; unlike ANF, we also bind the result 
of the tail call. Thus, every result can thus be stored in a cache by CTS conversion 
and reused later (as described in Sect. 2). This requirement is not onerous: A’NF 
is a minimal variant of ANF, and lambda-lifting and ANF conversion are routine 
in compilers for functional languages. Most examples we show are in this form. 

In contrast, our transformation’s outputs are lambda-lifted but not in A’NF. 
For instance, we restrict base functions to take exactly one argument—a base 
input. As shown in Sect. 2.1, CTS functions take instead two arguments—a base 
input and a cache—and CTS derivatives take three arguments—an input, an 
input change, and a cache. We could normalize transformation outputs to inhabit 
the source language and follow the same invariants, but this would complicate 
our proofs for little benefit. Hence, we do not prescribe transformation outputs 
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to satisfy the same invariants, and we rather describe transformation outputs 
through separate grammars. 

As a result of this design choice, we consider languages for base programs, 
derivatives, CTS programs and CTS derivatives. In our Coq mechanization, we 
formalize those as four separate languages, saving us many proof steps to check 
the validity of required structural invariants. For simplicity, in this paper we 
define a single language called Az (for \-Lifted). This language satisfies invariants 
common to all these languages (including some of the A’NF invariants). Then, 
we define sublanguages of Az. We describe the semantics of Az informally, and 
we only formalize the semantics of its sublanguages. 


Syntax for terms. The A; language is a relatively conventional lambda-lifted A- 
calculus with a limited form of pattern matching on tuples. The syntax for terms 
and values is presented in Fig. 1. We separate terms and values in two distinct 
syntactic classes because we use big-step operational semantics. Our let-bindings 
are non-recursive as usual, and support shadowing. Terms cannot contain A- 
expressions directly, but only refer to closures through the environment, and 
similarly for literals and primitives; we elaborate on this in Sect. 3.2. We do 
not introduce case expressions, but only bindings that destructure tuples, both 
in let-bindings and A-expressions of closures. Our semantics does not assign 
meaning to match failures, but pattern-matchings are only used in generated 
programs and our correctness proofs ensure that the matches always succeed. 
We allow tuples to contain terms of form x @ dz, which update base values x 
with changes in dz, because A’NF-converting these updates is not necessary to 
the transformations. We often inspect the result of a function call “f x”, which 
is not a valid term in our syntax. Hence, we write “Q@(f,z)” as a syntactic sugar 
for “let y = f xin y” with y chosen fresh. 


Syntax for closed values. A closed value is either a closure, a tuple of values, 
a literal, a primitive, a nil change for a primitive or a replacement change. A 
closure is a pair of an evaluation environment E and a A-abstraction closed 
with respect to E. The set of available literals £ is left abstract. It may contain 
usual first-order literals like integers. We also leave abstract the primitives p like 
if-then-else or projections of tuple components. Each primitive p comes with 
a nil change, which is its derivative as explained in Sect. 2. A change value can 
also represent a replacement by some closed value a,. Replacement changes are 
not produced by static differentiation but are useful for clients of derivatives: we 
include them in the formalization to make sure that they are not incompatible 
with our system. As usual, environments Æ map variables to closed values. 


Sublanguages of Az. The source language for all our transformations is a sublan- 
guage of A; named Aaz, where A stands for A’NF. To each transformation we 
associate a target language, which matches the transformation image. The target 
language for CTS conversion is named Acaz, where “C” stands for CTS. The tar- 
get languages of differentiation and CTS differentiation are called, respectively, 
AraL and AzcaL, where the ‘T’ stands for incremental. 
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3.2 The Source Language AAL 


We show the syntax of Aaz in Fig. 2. As said above, Aaz is a sublanguage of 
Àz denoting lambda-lifted base terms in A’NF. With no loss of generality, we 
assume that all bound variables in 4; programs and closures are distinct. The 
step-indexed big-step semantics (Fig.3) for base terms is defined by the judg- 
ment written E F t n v (where n can be omitted) and pronounced “Under 
environment E, base term t evaluates to closed value v in n steps.” Intuitively, 
our step-indexes count the number of “nodes” of a big-step derivation.'! As they 
are relatively standard, we defer the explanations of these rules to Appendix B. 


Term differentiation | dt = D‘(t) Azar Change terms 
D' (x) = dr dt := dr 
D' (let y = f zint) = | lety =f z, dy = df z dz 
let y = f x, dy = df x dz in D' (t) in dt o 
D' (let y = (Z) int) = | lety = (7), dy = (dr) 
let y = (T), dy = (dz) in D' (t) in dt 
Value differentiation | dv = D' (v) AIAL change values 
R — dv ::= (dv) | dE[Az dz. dt] | 
D zo 7 PUD dr. D'(t aani 
= ne _ a PIE da DR Azaz Change environments 
D' (p) = Op dE ::= è | dE; x = v; dx = du 
Environment differentiation | dE = D' (E) Jaz base terms 
Di(e) =e t ::= x | let y = f zint | 
D'(E;2 =v) = D'(E);2 = v; dz =D" u) let y = (T) int 
Base/updated environment | E = | dF |; Aaz closed values 
loj; =0 i=1,2 v ::= (v) | Efàz.t] | £| p 
|dE; x = v; dz = duj; = |dE |i; £ = v’ AaL value environments 
v =vifi=lor E ::= 6| E;z=v 
v =v@dvifi=2 


Fig. 2. Static differentiation D‘(—); syntax of its target language Azaz, tailored to the 
output of differentiation; syntax of its source language Aaz. We assume that in Azaz the 
same let binds both y and dy and that a-renaming preserves this invariant. We also 
define the base environment |dE|, and the updated environment |dE|2 of a change 
environment dE. 


Expressiveness. A closure in the base environment can be used to represent a 
top-level definition. Since environment entries can point to primitives, we need 
no syntax to directly represent calls of primitives in the syntax of base terms. 
To encode in our syntax a program with top-level definitions and a term to be 
evaluated representing the entry point, one can produce a term t representing the 


1 It is more common to count instead small-step evaluation steps [3,4], but our choice 
simplifies some proofs and makes a minor difference in others. 
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[SPRIMITIVECALL| 


; [STuPLE| E(f)=p 
[SVaR] E; y =(E(Z)) Et nv E; y = bp(E(z)) F t Yn v 
Eb adh E(x) E F let y = (T) int Yny v E H lety = f rint n41 v 


[SCLOSURECALL]| 
E(f) = E;[Az. ty] Ep; x = E(x) F tp Ym vy E;y= vy tnu 
EF let y = f rint m+n v 


Fig. 3. Step-indexed big-step semantics for base terms of source language AAL. 


entry point together with an environment E containing as values any top-level 
definitions, primitives and literals used in the program. Semi-formally, given an 
environment Eo mentioning needed primitives and literals, and a list of top-level 
function definitions D = f = Ax. t defined in terms of Ep, we can produce a base 
environment E = £(D), with £ defined by: 


L(e) = Eo and L(D,f = Az.t) = E, f = E[Az.t] where L(D) = E 


Correspondingly, we extend all our term transformations to values and environ- 
ments to transform such encoded top-level definitions. 

Our mechanization can encode n-ary functions “\(a1,22,...,@n).¢” through 
unary functions that accept tuples; we encode partial application using a curry 
primitive such that, essentially, curry f xy = f (x,y); suspended partial appli- 
cations are represented as closures. This encoding does not support currying 
efficiently, we further discuss this limitation in Sect. 4.4. 

Control operators, like recursion combinators or branching, can be introduced 
as primitive operations as well. If the branching condition changes, expressing the 
output change in general requires replacement changes. Similarly to branching 
we can add tagged unions. 

To check the assertions of the last two paragraphs, the Coq development 
contains the definition of a curry primitive as well as a primitive for a fixpoint 
combinator, allowing general recursion and recursive data structures as well. 


3.3 Static Differentiation from Aaz to AAL 


Previous work [7] defines static differentiation for simply-typed A-calculus terms. 
Figure 2 transposes differentiation as a transformation from Aaz to Azar and 
defines A741’s syntax. 

Differentiating a base term t produces a change term D’(t), its derivative. 
Differentiating final result variable x produces its change variable dz. Differenti- 
ation copies each binding of an intermediate result y to the output and adds a 
new binding for its change dy. If y is bound to tuple (T), then dy will be bound 
to the change tuple (dz). If y is bound to function application “f x”, then dy will 
be bound to the application of function change df to input x and its change dz. 
We explain differentiation of environments D‘(£) later in this section. 
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[SD TuPLe] 
dE(Z, dr) = Ux, dus 
[SDVaR] dE; y = (Ta); dy = (dz) F dt Wn dv 
dE + dz Jı dE(dz) dE + let y = (Z), dy = (dx) in dt n41 dv 
[SDREPLACECALL] 


[dE Jı F Qf, £) Ym vy — [dE Jz @(F, £) Yn vy’ 
dE (df) = ly dE; y = vy; dy = luy + dt Ip du 


dE = let y =f xz, dy = df x drin dt \bm4n+p+41 dv 


[SDPRIMITIVENIL] 

dE(f, df) =p,0p dE (xz, dr) = vz, dus 
dE; y = dp(vr); dy = Ap(vz, duz) F dt yn du 
dE let y =f x, dy = df x dzin dt yn4i du 


[SDCLOSURECHANGE| 
dE(f, df) = Ey|Ax.t,], dey [Ax dz. dtp] 
dE (xz, dr) = vz, dvs Erna = ve tp bn vy 
dE; x = vz; dx = duz F dtf n duy dE; y = vy; dy = dv, F dt |p du 
dE F let y = f xz, dy = df x dx in dt Jm+n+p+1 dv 


Fig. 4. Step-indexed big-step semantics for the change terms of Azz. 


Evaluating D’(t) recomputes all intermediate results computed by t. This 
recomputation will be avoided through cache-transfer style in Sect. 3.5. A com- 
parison with the original static differentiation [7] can be found in Appendix A. 


Semantics for Arar. We move on to define how Azaz change terms evaluate 
to change values. We start by defining necessary definitions and operations on 
changes, such as define change values dv, change environments dE, and the 
update operator ®. 

Closed change values dv are particular Àz values a,. They are either a closure 
change, a tuple change, a literal change, a replacement change or a primitive nil 
change. A closure change is a closure containing a change environment dE and 
a A-abstraction expecting a value and a change value as arguments to evaluate a 
change term into an output change value. An evaluation environment dE follows 
the same structure as let-bindings of change terms: it binds variables to closed 
values and each variable x is immediately followed by a binding for its associated 
change variable dz. As with let-bindings of change terms, a-renamings in an 
environment dE must rename dz into dy if x is renamed into y. We define the 
update operator ® to update a value with a change. This operator is a partial 
function written “v @ dv”, defined as follows: 
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vi D lus = v2 
LO del = ôg (£, de) 
E\Az.t] & dE|Az dz. dt] = (E 6 dE)[Az. t] 
(v,..., Un) ® (dv1,..., dun) = (v1 @ du1,..., Un ® dun) 
p © 0p =p 


where (E; x = v) @ (dE; x = v; dz = dv) = ((E @ dE); z = (v @ dv)). 

Replacement changes can be used to update all values (literals, tuples, prim- 
itives and closures), while tuple changes can only update tuples, literal changes 
can only update literals, primitive nil can only update primitives and closure 
changes can only update closures. A replacement change overrides the current 
value v with a new one v’. On literals, © is defined via some interpretation 
function 6g, which takes a literal and a literal change to produce an updated 
literal. Change update for a closure ignores dt instead of computing something 
like dE[t $ dt]. This may seem surprising, but we only need @ to behave well 
for valid changes (as shown by Theorem 3.1): for valid closure changes, dt must 
behave anyway similarly to D‘(t), which Cai et al. [7] show to be a nil change. 
Hence, t @ D'(t) and t dt both behave like t, so ® can ignore dt and only con- 
sider environment updates. This definition also avoids having to modify terms at 
runtime, which would be difficult to implement safely. We could also implement 
f @df as a function that invokes both f and df on its argument, as done by Cai 
et al. [7], but we believe that would be less efficient when © is used at runtime. 
As we discuss in Sect. 3.4, we restrict validity to avoid this runtime overhead. 

Having given these definitions, we show in Fig.4 a step-indexed big-step 
semantics for change terms, defined through judgment dE | dt n dv (where n 
can be omitted). This judgment is pronounced “Under the environment dE, the 
change term dt evaluates into the closed change value dv in n steps.” Rules 
[SDVar] and [SDTupLe] are unsurprising. To evaluate function calls in let- 
bindings “let y = f x,dy = df x drin dt” we have three rules, depending on 
the shape of dE (df). These rules all recompute the value vy of y in the original 
environment, but compute differently the change dy to y. If dE(df) replaces 
the value of f, [SDREPLACECALL] recomputes ti = f x from scratch in the new 
environment, and bind dy to lv, when evaluating the let body. If dE(df) is the 
nil change for primitive p, [SDPRIMITIVENIL] computes dy by running p’s deriva- 
tive through function Ap(-). If dE (df) is a closure change, [SDCLOSURECHANGE] 
invokes it normally to compute its change dv,. As we show, if the closure change 
is valid, its body behaves like f’s derivative, hence incrementalizes f correctly. 

Closure changes with non-nil environment changes represent partial applica- 
tion of derivatives to non-nil changes; for instance, if f takes a pair and dz is a 
non-nil change, Ocurry f df x dz constructs a closure change containing dz, using 
the derivative of curry mentioned in Sect. 3.2. In general, such closure changes 
do not arise from the rules we show, only from derivatives of primitives. 
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3.4 A New Soundness Proof for Static Differentiation 


In this section, we show that static differentiation is sound (Theorem 3.3) and 
that Eq. (1) holds: 


f ag = f a, ® D'(f) a, da (1) 


whenever da is a valid change from a, to a2 (as defined later). One might want to 
prove this equation assuming only that a; ® da = ag, but this is false in general. 
A direct proof by induction on terms fails in the case for application (ultimately 
because fı © df = fo and a, © da = ag do not imply that fı a; @ df a, da = 
f2 a2). As usual, this can be fixed by introducing a logical relation. We call 
ours validity: a function change is valid if it turns valid input changes into valid 
output changes. 


o dl >n L —> g (L, dé) e luv Dn UO 2 e Opn P> P 


e (dvi,...,dUm) >n (U1,--+; Um) > (V1, ---, Um) 
if and only if (v1, ..., Um) ® (dui,..., dum) = (vj,---, Um) 
and Vk < n, Vi € [1...m], dvi >k vi => v; 


e dE[Az dz. dt] >n Ey[Az.t] @ E[Az. t] 
if and only if E2 = FE, 6 dE and 
Vk < n, v, dv, v2, 
if dv >k v > ve then 
(dE; x = v; dz = du F dt) >p (E1; £ = v F t) > (E2; £ = v2 F t) 


e (dE H dt) Pn (Fy H tı) > (E2 H t2) 
if and only if Yk < n, v, v2, 
Eı F ti |k v and Ez F t2 |} v2 implies that 
ddv, dE dtl) dvu A du >n-k vi > v2 


Fig. 5. Step-indexed validity, through judgments for values and for terms. 


Static differentiation is only sound on input changes that are valid. Cai 
et al. |7] show soundness for a strongly normalizing simply-typed \-calculus using 
denotational semantics. Using an operational semantics, we generalize this result 
to an untyped and Turing-complete language, so we must turn to a step-indezed 
logical relation [3,4]. 


Validity as a step-indexed logical relation. We say that “dv is a valid change 
from v to v2, up to k steps” and write 


du Dp Uy v2 


to mean that dv is a change from vı to vg and that dv is a valid description of 
the differences between vı and vg, with validity tested with up to k steps. This 
relation approximates validity; if a change dv is valid at all approximations, it 
is simply valid (between vı and v2); we write then du > vı > v (omitting the 
step-index k) to mean that validity holds at all step-indexes. We similarly omit 
step-indexes k from other step-indexed relations when they hold for all k. 
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To justify this intuition of validity, we show that a valid change from vı 
to vp goes indeed from vı to v2 (Theorem 3.1), and that if a change is valid up 
to k steps, it is also valid up to fewer steps (Lemma3.2). 


Theorem 3.1 (@ agrees with validity) 
If dv Dp vı > vy holds for all k > 0, then v, @ dv = vo. 


Lemma 3.2 (Downward-closure) 
If N >n, then du Py v vg implies du Dy v1 > Vo. 


Crucially, Theorem 3.1 enables (a) computing v2 from a valid change and its 
source, and (b) showing Eq. (1) through validity. As discussed, ® ignores changes 
to closure bodies to be faster, which is only sound if those changes are nil; to 
ensure Theorem 3.1 still holds, validity on closure changes must be adapted 
accordingly and forbid non-nil changes to closure bodies. This choice, while 
unusual, does not affect our results: if input changes do not modify closure bod- 
ies, intermediate changes will not modify closure bodies either. Logical relation 
experts might regard this as a domain-specific invariant we add to our relation. 
Alternatives are discussed by Giarrusso [10, Appendix C]. 

As usual with step-indexing, validity is defined by well-founded induction 
over naturals ordered by <; to show well-foundedness we observe that evaluation 
always takes at least one step. 

Validity for values, terms and environments is formally defined by cases in 
Fig.5. First, a literal change d£ is a valid change from £ to €@ dé = ðẹ (£, dé). 
Since the function dg is partial, the relation only holds for the literal changes 
dé which are valid changes for ¢. Second, a replacement change lwz is always a 
valid change from any value vı to v2. Third, a primitive nil change is a valid 
change between any primitive and itself. Fourth, a tuple change is valid up to 
step n, if each of its components is valid up to any step strictly less than n. 
Fifth, we define validity for closure changes. Roughly speaking, this statement 
means that a closure change is valid if (i) its environment change dE is valid 
for the original closure environment F and for the new closure environment E%; 
and (ii) when applied to related values, the closure bodies t are related by dt, 
as defined by the auxiliary judgment (dE + dt) >n (E1 F ti) © (E2 F te) 
for validity between terms under related environments (defined in Appendix C). 
As usual with step-indexed logical relations, in the definition for this judgment 
about terms, the number k of steps required to evaluate the term tı is subtracted 
from the number of steps n that can be used to relate the outcomes of the term 
evaluations. 


Soundness of differentiation. We can state a soundness theorem for differentia- 
tion without mentioning step-indexes; thanks to this theorem, we can compute 
the updated result vg not by rerunning a computation, but by updating the base 
result vı with the result change dv that we compute through a derivative on the 
input change. A corollary shows Eq. (1). 


Incremental \-Calculus in Cache-Transfer Style 567 


Theorem 3.3 (Soundness of differentiation in Aaz). If dE is a valid 
change environment from base environment E to updated environment Es, that 
is dE > Eı —> Eg, and if t converges both in the base and updated environment, 
that is Ey F t uy and Ea F t |) v2, then D'(t) evaluates under the change 
environment dE to a valid change du between base result vı and updated result 
vg, that is dE F D‘(t) } du, du > v ve and v, @ du = v9. 


We must first show that derivatives map input changes valid up to k steps 
to output changes valid up to k steps, that is, the fundamental property of our 
step-indexed logical relation: 


Lemma 3.4 (Fundamental Property) 
For each n, if dE >», Ey > E2 then (dE + D‘(t)) >n (E1 F t) > (E2 Ft). 


Translation of terms| M = 7;(t’) Base terms 
Ti(let y = f zint’) = let y, c}, = f cin T(t’) M ::= let y, C = f tin M 
Tillet y = (Z) int’) = let y = (T) in T(t’) | lety = (z)in M 
Ti(a) = (z,C (t) paan 
Cache terms/patterns 
Cache of a term| C = C(t) C ::= (C, ci) | (C£) 10 
C(let y = f zint) = ), y), ch) Closed values 


((C(t } 
C(let y = (T) int) = (C(t), y) V ::= (V) | F[Az.M] | £| p 
C(x) = () Cache values 
Ve = () | (Ve, Ve) | (Ve, V) 
Evaluation environments 


T((v)) = (T (v)) F ::= e | F; D, 
T(Efàz.t]) = T(E)[Az. T(t] 


Translation of values| V = T (v) 


Base environment entries 
=e y — 
Dy u= 2 = V | ch = Ve 


Fig. 6. Cache-Transfer Style translation and syntax of its target language Acar. 


3.5 CTS Conversion 


Figures 6 and 7 define both the syntax of Acar and Arcazr and CTS conversion. 
The latter comprises CTS differentiation D(—), from Aaz to Arcaz, and CTS 
translation 7 (—), from Aaz to ACAL. 


Syntax definitions for the target languages Acar and Arcar. Terms of Acar 
follow again -lifted A’NF, like \4,, except that a let-binding for a function 
application “f x” now binds an extra cache identifier Ch besides output y. Cache 
identifiers have non-standard syntax: it can be seen as a triple that refers to 
the value identifiers f, x and y. Hence, an a-renaming of one of these three 
identifiers must refresh the cache identifier accordingly. Result terms explicitly 
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return cache C through syntax (z, C). Caches are encoded through nested tuples, 
but they are in fact a tree-like data structure that is isomorphic to an execution 
trace. This trace contains both immediate values and the execution traces of 
nested function calls. 

The syntax for Arca matches the image of the CTS derivative and witnesses 
the CTS discipline followed by the derivatives: to determine dy, the derivative 
of f evaluated at point z with change dz expects the cache produced by evaluat- 
ing y in the base term. The derivative returns the updated cache which contains 
the intermediate results that would be gathered by the evaluation of f (x @ dz). 
The result term of every change term returns the computed change and a cache 
update dC, where each value identifier x of the input cache is updated with its 
corresponding change dz. 


Differentiation of terms | dM = D,(t’) Change terms 
D,(let y = f zint’) = let dy, cf, = df x dx cj, in dM ::= let dy, cf, = df x dz cj, 
D(t’) in dM 
D, (let y = (T) int’) = let dy = (dz) inD,(M’) | let dy = (dz) in dM 
D(x) = (dz, U (t)) | (dz, dC) 
= Cache updates 
Cache update of a term| dC = U(t) 
dC ::= (dC, ch) | (dC, £ @ dz) 
U(let y = f zint) = ((U (t), y © dy), ch) 10 
U (let y = (T) int) = (U (t), y © dy) E 
U(x) = () aes 
dV ::= (dV) | dF [Ar dx C. dM 
Differentiation of change values | dV = T (dv) OY Nae Neue | 
| dl | Op | !V 
T((dv)) = (T (dv)) Change environments 
T (dE[Aa dz. D'(t)]) = T (dE)[Azx dz (C(t)). Di (t)] dF ::= o | dF; dD, 
T (!v) =!T (v) 
T (de) = de Change environment entries 
T (Op) = Op dD, := De | dx = dV 


Fig. 7. CTS differentiation and syntax of its target language Arcar. Beware 
T (dE|Ax dx. D'(t)]) applies a left-inverse of D' (t) during pattern matching. 


CTS conversion and differentiation. These translations use two auxiliary func- 
tions: C(t) which computes the cache term of a Aaz term t, and U(t), which 
computes the cache update of t’s derivative. 

CTS translation on terms, 7;(t’), accepts as inputs a global term t and a 
subterm t of t. In tail position (t’ = x), the translation generates code to return 
both the result z and the cache C(t) of the global term t. When the transforma- 
tion visits let-bindings, it outputs extra bindings for caches Ch on function calls 
and visits the let-body. 

Similarly to 7;(t’), CTS derivation D;(t’) accepts a global term t and a 
subterm t’ of t. In tail position, the translation returns both the result change dz 
and the cache update U (t). On let-bindings, it does not output bindings for y 
but for dy, it outputs extra bindings for GA as in the previous case and visits 
the let-body. 
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To handle function definitions, we transform the base environment E through 
T(E) and T(D‘(£)) (translations of environments are done pointwise, see 
Appendix D). Since D‘(£) includes E, we describe T (D‘(E)) to also cover 7 (E). 
Overall, 7 (D'(E)) CTS-converts each source closure f = E[Az.t] to a CTS- 
translated function, with body 7;(t), and to the CTS derivative df of f. This 
CTS derivative pattern matches on its input cache using cache pattern C(t). That 
way, we make sure that the shape of the cache expected by df is consistent with 
the shape of the cache produced by f. The body of derivative df is computed by 
CTS-deriving f’s body via D,(t). 


3.6 Semantics of Acar and Arca 


An evaluation environment F of Acar contains both values and cache values. 
Values V resemble Aaz values v, cache values V, match cache terms C and 
change values dV match Azaz change values dv. Evaluation environments dF 
for change terms must also bind change values, so functions in change closures 
take not just a base input z and an input change dz, like in Azaz, but also 
an input cache C. By abuse of notation, we reuse the same syntax C to both 
deconstruct and construct caches. 

Base terms of the language are evaluated using a conventional big-step seman- 
tics, consisting of two judgments. Judgment “F H M 4 (V,V.)” is read “Under 
evaluation environment F, base term M evaluates to value V and cache V.”. The 
semantics follows the one of Aaz; since terms include extra code to produce and 
carry caches along the computation, the semantics evaluates that code as well. 
For space reasons, we defer semantic rules to Appendix E. Auxiliary judgment 
“FEC | Ve” evaluates cache terms into cache values: It traverses a cache term 
and looks up the environment for the values to be cached. 

Change terms of A7caz are also evaluated using a big-step semantics, which 
resembles the semantics of Azaz and Acar. Unlike those semantics, evaluating 
cache updates (dC, x ® dr) is evaluated using the ® operator (overloaded on 
Acar values and Azcaz changes). By lack of space, its rules are deferred to 
Appendix E. This semantics relies on three judgments. Judgment “dF + dM |) 
(dV,V.)” is read “Under evaluation environment F, change term dM evaluates 
to change value dV and updated cache V,”. The first auxiliary judgment “dF 
dC J) Ve” defines evaluation of cache update terms. The final auxiliary judgment 
“Vo ~ C — dF” describes a limited form of pattern matching used by CTS 
derivatives: namely, how a cache pattern C matches a cache value V, to produce 
a change environment dF’. 


3.7 Soundness of CTS Conversion 


The proof is based on a simulation in lock-step, but two subtle points emerge. 
First, we must relate Az environments that do not contain caches, with ACAL 
environments that do. Second, while evaluating CTS derivatives, the evaluation 
environment mixes caches from the base computation and updated caches com- 
puted by the derivatives. 
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Theorem 3.7 follows because differentiation is sound (Theorem 3.3) and evalu- 
ation commutes with CTS conversion; this last point requires two lemmas. First, 
CTS translation of base terms commutes with our semantics: 


Lemma 3.5 (Commutation for base evaluations) 
For all E,t and v, if EF td) v, there exists Vo, T(E) F T(t) 4 (T (w), Va). 


Second, we need a corresponding lemma for CTS translation of differentiation 
results: intuitively, evaluating a derivative and CTS translating the resulting 
change value must give the same result as evaluating the CTS derivative. But to 
formalize this, we must specify which environments are used for evaluation, and 
this requires two technicalities. 

Assume derivative D' (t) evaluates correctly in some environment dE. Evalu- 
ating CTS derivative D;(t) requires cache values from the base computation, but 
they are not in 7 (dE)! Therefore, we must introduce a judgment to complete a 
CTS-translated environment with the appropriate caches (see Appendix F). 

Next, consider evaluating a change term of the form dM = C[dM"], where C 
is a standard single-hole change-term context—that is, for A;cazt, a sequence 
of let-bindings. When evaluating dM, we eventually evaluate dM’ in a change 
environment dF updated by C: the change environment dF contains both the 
updated caches coming from the evaluation of C and the caches coming from 
the base computation (which will be updated by the evaluation of dM). Again, 
a new judgment, given in Appendix F, is required to model this process. 

With these two judgments, the second key Lemma stating the commutation 
between evaluation of derivatives and evaluation of CTS derivatives can be stated. 
We give here an informal version of this Lemma, the actual formal version can 
be found in Appendix F. 


Lemma 3.6 (Commutation for derivatives evaluation) 

If the evaluation of D'(t) leads to an environment dEo when it reaches the 
differentiated contert D'(C) where t = C[t’], and if the CTS conversion of t 
under this environment completed with base (resp. changed) caches evaluates 
into a base value T (v) (resp. a changed value T(v')) and a base cache value 
Vo (resp. an updated cache value V/), then under an environment containing 
the caches already updated by the evaluation of D'(C) and the base caches to be 
updated, the CTS derivative of t evaluates to T(dv) such that v ® dv = v’ and 
to the updated cache V!. 


Finally, we can state soundness of CTS differentiation. This theorem says 
that CTS derivatives not only produce valid changes for incrementalization but 
that they also correctly consume and update caches. 


Theorem 3.7 (Soundness of CTS differentiation) 
If the following hypotheses hold: 


1. dE >ò E =— F' 
2 EFtlu 
& BE’ Ftv’ 
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then there exists dv, Ve, Vi and Fo such that: 


T(E) E T(t)  (T(v), Ve) 
T(E") T(t) 4 (T), Vo) 
CO VF 

T (dE); Fo F Dy (t) 4 (T (dv), Vo) 
v@dv=' 


as woh 


4 Incrementalization Case Studies 


In this section, we investigate two questions: whether our transformations can 
target a typed language like Haskell and whether automatically transformed 
programs can perform well. We implement by hand primitives on sequences, 
bags and maps in Haskell. The input terms in all case studies are written in a 
deep embedding of Aaz into Haskell. The transformations generate Haskell code 
that uses our primitives and their derivatives. 

We run the transformations on three case studies: a computation of the aver- 
age value of a bag of integers, a nested loop over two sequences and a more 
involved example inspired by Koch et al. [17]’s work on incrementalizing database 
queries. For each case study, we make sure that results are consistent between 
from scratch recomputation and incremental evaluation; we measure the execu- 
tion time for from scratch recomputation and incremental computation as well 
as the space consumption of caches. We obtain efficient incremental programs, 
that is ones for which incremental computation is faster than from scratch recom- 
putation. The measurements indicate that we do get the expected asymptotic 
improvement in time of incremental computation over from scratch recomputa- 
tion by a linear factor while the caches grows in a similar linear factor. 

Our benchmarks were compiled by GHC 8.2.2 and run on a 2.20 GHz hexa 
core Intel(R) Xeon(R) CPU E5-2420 v2 with 32GB of RAM running Ubuntu 
14.04. We use the criterion |21] benchmarking library. 


4.1 Averaging Bags of Integers 


Section 2.1 motivates our transformation with a running example of computing 
the average over a bag of integers. We represent bags as maps from elements to 
(possibly negative) multiplicities. Earlier work [7,17] represents bag changes as 
bags of removed and added elements. We use a different representation of bag 
changes that takes advantage of the changes to elements and provide primitives 
on bags and their derivatives. The CTS variant of map, that we call mapC, takes 
a function fC in CTS and a bag as and produces a bag and a cache. The cache 
stores for each invocation of fC, and therefore for each distinct element in as, 
the result of fC of type b and the cache of type c. 

Inspired by Rossberg et al. [23], all higher-order functions (and typically, also 
their caches) are parametric over cache types of their function arguments. Here, 
functions mapC and dmapC and cache type MapC are parametric over the cache 
type c of fC and dfC. 


572 P. G. Giarrusso et al. 


map :: (a — b) + Bag a > Bag b 

data MapC a b c = MapC (Map a (b, c)) 

mapC' :: (a — (b, c)) — Bag a > (Bag b, MapC a b c) 

dmapC :: (a > (b, c)) — (a — Aa > c > (Ab, c)) — Bag a > A(Bag a) > 
MapC a b c — (A(Bag b), MapC a b c) 


We wrote the length and sum functions used in our benchmarks in terms of 
primitives map and foldGroup and had their CTS function and CTS derivative 
generated automatically. 

We evaluate whether we can produce an updated result with daverageC 
shown in Sect. 2.1 faster than by from scratch recomputation with average. We 
expect the speedup of daverageC to depend on the size of the input bag n. We 
fix an input bag of size n as the bag containing the numbers from 1 to n. We 
define a change that inserts the integer 1 into the bag. To measure execution 
time of from scratch recomputation, we apply average to the input bag updated 
with the change. To measure execution time of the CTS function averageC, we 
apply averageC to the input bag updated with the change. To measure execution 
time of the CTS derivative daverageC’, we apply daverageC' to the input bag, 
the change and the cache produced by averageC' when applied to the input bag. 
In all three cases we ensure that all results and caches are fully forced so as to 
not hide any computational cost behind laziness. 


wn wn 
as! as) 
a —@— From scratch a —@— From scratch 
©? 0.04 || —=— CTS ? 0.4 H—=— crs 
4 —@— Derivative 3 —@— Derivative 
E 0.02} | # 0.2 
£ p 
ee) 
z ok J} 2 of . 
= 0 50 100 P 0 50 100 
input size input size 
(a) Benchmark results for average (b) Benchmark results for totalPrice 


Fig. 8. Benchmark results for average and totalPrice 


The plot in Fig. 8a shows execution time versus the size n of the base input. 
To produce the base result and cache, the CTS transformed function averageC 
takes longer than the original average function takes to produce just the result. 
Producing the updated result incrementally is slower than from scratch recom- 
putation for small input sizes, but because of the difference in time complexity 
becomes faster as the input size grows. The size of the cache grows linearly with 
the size of the input, which is not optimal for this example. We leave optimizing 
the space usage of examples like this to future work. 
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4.2 Nested Loops over Two Sequences 


Next, we consider CTS differentiation on a higher-order example. To incremen- 
talize this example efficiently, we have to enable detecting nil function changes 
at runtime by representing function changes as closures that can be inspected 
by incremental programs. Our example here is the Cartesian product of two 
sequences computed in terms of functions map and concat. 


cartesianProduct :: Sequence a — Sequence b — Sequence (a, b) 
cartesianProduct zs ys = concatMap (Ax — map (Ay — (x, y)) ys) zs 
concatMap :: (a > Sequence b) — Sequence a — Sequence b 
concatMap f zs = concat (map f xs) 


We implemented incremental sequences and related primitives following 
Firsov and Jeltsch [9]: our change operations and first-order operations (such as 
concat) reuse their implementation. On the other hand, we must extend higher- 
order operations such as map to handle non-nil function changes and caching. A 
correct and efficient CTS derivative dmapC' has to work differently depending 
on whether the given function change is nil or not: For a non-nil function change 
it has to go over the input sequence; for a nil function change it has to avoid 
that. 

Cai et al. [7] use static analysis to conservatively approximate nil function 
changes as changes to terms that are closed in the original program. But in this 
example the function argument (Ay — (a, y)) to map in cartesianProduct is not 
a closed term. It is, however, crucial for the asymptotic improvement that we 
avoid looping over the inner sequence when the change to the free variable x in 
the change environment is 0,. 

To enable runtime nil change detection, we apply closure conversion to the 
original program and explicitly construct closures and changes to closures. While 
the only valid change for closed functions is their nil change, for closures we can 
have non-nil function changes. A function change df, represented as a closure 
change, is nil exactly when all changes it closes over are nil. 

We represent closed functions and closures as variants of the same type. Cor- 
respondingly we represent changes to a closed function and changes to a closure 
as variants of the same type of function changes. We inspect this representation 
at runtime to find out if a function change is a nil change. 


data Fun a b c where 
Closed :: (a — (b, c)) + Funa b c 
Closure :: (e + a —> (b,c)) > e > Funa b c 
data A(Fun a b c) where 
DClosed :: (a — Aa > c > (Ab, c)) — A(Fun a b c) 
DClosure :: (e + Ae > a > Aa > c > (Ab, c)) > e — Ae — A(Fun a b c) 


We use the same benchmark setup as in the benchmark for the average compu- 
tation on bags. The input of size n is a pair of sequences (xs, ys). Each sequence 
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initially contains the integers from 1 to n. Updating the result in reaction to a 
change dzs to the outer sequence xs takes less time than updating the result in 
reaction to a change dys to the inner sequence ys. While a change to the outer 
sequence zs results in an easily located change in the output sequence, a change 
for the inner sequence ys results in a change that needs a lot more calculation 
to find the elements it affects. We benchmark changes to the outer sequence xs 
and the inner sequence ys separately where the change to one sequence is the 
insertion of a single integer 1 at position 1 and the change for the other one is 
the nil change. 


Nn Nn 

g —@— From scratch g —@— From scratch 

8 3 E CTS ® 3h e CTS 

5 2 —@— Derivative = ab —@— Derivative 

S 1 a 1 

@ 9 

g OF] i 2 OF 

35 0 50 100 + 0 50 100 
input size input size 


(a) Benchmark results for Cartesian product (b) Benchmark results for Cartesian product 
changing inner sequence. changing outer sequence. 


Fig. 9. Benchmark results for cartesianProduct 


Figure 9 shows execution time versus input size. In this example again prepar- 
ing the cache takes longer than from scratch recomputation alone. The speedup 
of incremental computation over from scratch recomputation increases with the 
size of the base input sequences because of the difference in time complexity. 
Eventually we do get speedups for both kinds of changes (to the inner and to 
the outer sequence), but for changes to the outer sequence we get a speedup 
earlier, at a smaller input size. The size of the cache grows super linearly in this 
example. 


4.3 Indexed Joins of Two Bags 


Our goal is to show that we can compose primitive functions into larger and 
more complex programs and apply CTS differentiation to get a fast incremental 
program. We use an example inspired from the DBToaster literature [17]. In this 
example we have a bag of orders and a bag of line items. An order is a pair of an 
order key and an exchange rate. A line item is a pair of an order key and a price. 
We build an index mapping each order key to the sum of all exchange rates of 
the orders with this key and an index from order key to the sum of the prices 
of all line items with this key. We then merge the two maps by key, multiplying 
corresponding sums of exchange rates and sums of prices. We compute the total 
price of the orders and line items as the sum of those products. 
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type Order = (Z,Z) 
type Lineltem = (Z, Z) 
totalPrice :: Bag Order — Bag Lineltem — Z 
totalPrice orders lineltems = let 
orderIndex = groupBy fst orders 
orderSumIndex = Map.map (Bag.foldMapGroup snd) orderIndex 
lineItemIndex = groupBy fst lineltems 
lineltemSumIndez = Map.map (Bag.foldMapGroup snd) lineItemIndex 
merged = Map.merge orderSumIndex lineItemSumIndex 
total = Map.foldMapGroup multiply merged 
in total 
groupBy :: (a > k) + Bag a > Map k (Bag a) 
groupBy keyOf bag = 
Bag.foldMapGroup (Aa > Map.singleton (keyOf a) (Bag.singleton a)) bag 


Unlike DBToaster, we assume our program is already transformed to explicitly 
use indexes, as above. Because our indexes are maps, we implemented a change 
structure, CTS primitives and their CTS derivatives for maps. 

To build the indexes, we use a groupBy function built from primitive func- 
tions foldMapGroup on bags and singleton for bags and maps respectively. The 
CTS function groupByC and the CTS derivative dgroupByC are automatically 
generated. While computing the indexes with groupBy is self-maintainable, merg- 
ing them is not. We need to cache and incrementally update the intermediately 
created indexes to avoid recomputing them. 

We evaluate the performance in the same way we did in the other case studies. 
The input of size n is a pair of bags where both contain the pairs (7,7) for i 
between 1 and n. The change is an insertion of the order (1,1) into the orders 
bag. For sufficiently large inputs, our CTS derivative of the original program 
produces updated results much faster than from scratch recomputation, again 
because of a difference in time complexity as indicated by Fig. 8b. The size of 
the cache grows linearly with the size of the input in this example. This is 
unavoidable, because we need to keep the indexes. 


4.4 Limitations and Future Work 


Typing of CTS programs. Functions of the same type fi, fo : A — B can be 
transformed to CTS functions fı : A —> (B, C1), fo : A — (B, C2) with different 
cache types C1, C2, since cache types depend on the implementation. This het- 
erogeneous typing of translated functions poses difficult typing issues, e.g. what 
is the translated type of a list (A — B)? We cannot hide cache types behind exis- 
tential quantifiers because they would be too abstract for derivatives, which only 
work on very specific cache types. We can fix this problem with some runtime 
overhead by using a single type Cache, defined as a tagged union of all cache 
types or, maybe with more sophisticated type systems—like first-class translu- 
cent sums, open existentials or Typed Adapton’s refinement types [12]—that 
could be able to correctly track down cache types properly. 
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In any case, we believe that these machineries would add a lot of complexity 
without helping much with the proof of correctness. Indeed, the simulation rela- 
tion is more handy here because it maintains a global invariant about the whole 
evaluations (typically the consistency of cache types between base computations 
and derivatives), not many local invariants about values as types would. 

One might wonder why caches could not be totally hidden from the pro- 
grammer by embedding them in the derivatives themselves; or in other words, 
why we did not simply translate functions of type A — B into functions of 
type A — B x (AA — AB). We tried this as well; but unlike automatic dif- 
ferentiation, we must remember and update caches according to input changes 
(especially when receiving a sequence of such changes as in Sect. 2.1). Returning 
the updated cache to the caller works; we tried closing over the caches in the 
derivative, but this ultimately fails (because we could receive function changes 
to the original function, but those would need access to such caches). 


Comprehensive performance evaluation. This paper focuses on theory and we 
leave benchmarking in comparison to other implementations of incremental com- 
putation to future work. The examples in our case study were rather simple 
(except perhaps for the indexed join). Nevertheless, the results were encouraging 
and we expect them to carry over to more complex examples, but not to all 
programs. A comparison to other work would also include a comparison of space 
usage for auxiliary data structure, in our case the caches. 


Cache pruning via absence analysis. To reduce memory usage and runtime over- 
head, it should be possible to automatically remove from transformed programs 
any caches or cache fragments that are not used (directly or indirectly) to com- 
pute outputs. Liu [19] performs this transformation on CTS programs by using 
absence analysis, which was later extended to higher-order languages by Sergey 
et al. [25]. In lazy languages, absence analysis removes thunks that are not needed 
to compute the output. We conjecture that the analysis could remove unused 
caches or inputs, if it is extended to not treat caches as part of the output. 


Unary vs n-ary abstraction. We only show our transformation correct for 
unary functions and tuples. But many languages provide efficient support for 
applying curried functions such as div : Z — Z — Z. Naively transform- 
ing such a curried function to CTS would produce a function divC of type 
Z > (Z > (Z, DivC2)), DivCı with DivC, = (), which adds excessive overhead. 
In Sect. 2 and our evaluation we use curried functions and never need to use this 
naive encoding, but only because we always invoke functions of known arity. 


5 Related Work 


Cache-transfer-style. Liu [19]’s work has been the fundamental inspiration to this 
work, but her approach has no correctness proof and is restricted to a first-order 
untyped language. Moreover, while the idea of cache-transfer-style is similar, 
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it’s unclear if her approach to incrementalization would extend to higher-order 
programs. Firsov and Jeltsch [9] also approach incrementalization by code trans- 
formation, but their approach does not deal with changes to functions. Instead of 
transforming functions written in terms of primitives, they provide combinators 
to write CTS functions and derivatives together. On the other hand, they extend 
their approach to support mutable caches, while restricting to immutable ones 
as we do might lead to a logarithmic slowdown. 


Finite differencing. Incremental computation on collections or databases by 
finite differencing has a long tradition [6,22]. The most recent and impressive 
line of work is the one on DBToaster [16,17], which is a highly efficient app- 
roach to incrementalize queries over bags by combining iterated finite differenc- 
ing with other program transformations. They show asymptotic speedups both 
in theory and through experimental evaluations. Changes are only allowed for 
datatypes that form groups (such as bags or certain maps), but not for instance 
for lists or sets. Similar ideas were recently extended to higher-order and nested 
computation [18], though only for datatypes that can be turned into groups. 
Koch et al. [18] emphasize that iterated differentiation is necessary to obtain 
efficient derivatives; however, ANF conversion and remembering intermediate 
results appear to address the same problem, similarly to the field of automatic 
differentiation [27]. 


Logical relations. To study correctness of incremental programs we use a logical 
relation among base values vı, updated values v and changes dv. To define a 
logical relation for an untyped A-calculus we use a step-indexed logical relation, 
following Ahmed [4], Appel and McAllester [5]; in particular, our definitions are 
closest to the ones by Acar et al. [3], who also work with an untyped language, 
big-step semantics and (a different form of) incremental computation. However, 
they do not consider first-class changes. Technically, we use environments rather 
than substitution, and index our big-step semantics differently. 


Dynamic incrementalization. The approaches to incremental computation with 
the widest applicability are in the family of self-adjusting computation [1,2], 
including its descendant Adapton [14]. These approaches incrementalize pro- 
grams by combining memoization and change propagation: after creating a trace 
of base computations, updated inputs are compared with old ones in O(1) to 
find corresponding outputs, which are updated to account for input modifica- 
tions. Compared to self-adjusting computation, Adapton only updates results 
that are demanded. As usual, incrementalization is not efficient on arbitrary 
programs, but only on programs designed so that input changes produce small 
changes to the computation trace; refinement type systems have been designed 
to assist in this task [8,12]. To identify matching inputs, Nominal Adapton [13] 
replaces input comparisons by pointer equality with first-class labels, enabling 
more reuse. 
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6 Conclusion 


We have presented a program transformation which turns a functional program 
into its derivative and efficiently shares redundant computations between them 
thanks to a statically computed cache. 

Although our first practical case studies show promising results, this paper 
focused on putting CTS differentiation on solid theoretical ground. For the 
moment, we only have scratched the surface of the incrementalization oppor- 
tunities opened by CTS primitives and their CTS derivatives: in our opinion, 
exploring the design space for cache data structures will lead to interesting new 
results in purely functional incremental programming. 
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Abstract. We present a behavioural typing system for a higher-order 
timed calculus using session types to model timed protocols. Behavioural 
typing ensures that processes in the calculus perform actions in the time- 
windows prescribed by their protocols. We introduce duality and subtyp- 
ing for timed asynchronous session types. Our notion of duality allows 
typing a larger class of processes with respect to previous proposals. 
Subtyping is critical for the precision of our typing system, especially in 
the presence of session delegation. The composition of dual (timed asyn- 
chronous) types enjoys progress when using an urgent receive semantics, 
in which receive actions are executed as soon as the expected message 
is available. Our calculus increases the modelling power of extant calculi 
on timed sessions, adding a blocking receive primitive with timeout and 
a primitive that consumes an arbitrary amount of time in a given range. 


Keywords: Session types - Timers - Duality - a-calculus 


1 Introduction 


Time is at the basis of many real-life protocols. These include common client- 
server interactions as for example, “An SMTP server SHOULD have a timeout 
of at least 5 minutes while it is awaiting the next command from the sender” [22]. 
By protocol, we intend application-level specifications of interaction patterns (via 
message passing) among distributed applications. An extensive literature offers 
theories and tools for formal analysis of timed protocols, modelled for instance 
as timed automata [3, 26,34] or Message Sequence Charts [2]. These works allow 
to reason on the properties of protocols, defined as formal models. Recent work, 


This work has been partially supported by EPSRC EP/N035372/1, EP/K011715/1, 
EP /N027833/1, EP/K034413/1, EP/LO0058X/1, EP/N028201/1, Aut. Reg. of 
Sardinia projects Sardcoin and Smart collaborative engineering, FCT through 
project Confident PTDC/EEI-CTP/4503/2014 and the LASIGE Research Unit 
UID/CEC/00408/2019. We thank Julien Lange for his advise and comments. 

© The Author(s) 2019 


L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 583-610, 2019. 
https://doi.org/10.1007/978-3-030-17184-1_21 


584 L. Bocchi et al. 


based on session types, focus on the relationship between time-sensitive proto- 
cols, modelled as timed extensions of session types, and their implementations 
abstracted as processes in some timed calculus. The relationship between pro- 
tocols and processes is given in terms of static behavioural typing [12,15] or 
run-time monitoring [6,7,30] of processes against types. Existing work on timed 
session types [7,12,15,30] is based on simple abstractions for processes which do 
not capture time sensitive primitives such as blocking (as well as non-blocking) 
receive primitives with timeout and time consuming actions with variable, yet 
bound, duration. This paper provides a theory of asynchronous timed session 
types for a calculus that features these two primitives. We focus on the asyn- 
chronous scenario, as modern distributed systems (e.g., web) are often based 
on asynchronous communications via FIFO channels [4,33]. The link between 
protocols and processes is given in terms of static behavioural typing, checking 
for punctuality of interactions with respect to protocols prescriptions. Unlike 
previous work on asynchronous timed session types [12], our type system can 
check processes against protocols that are not wait-free. In wait-free protocols, 
the time-windows for corresponding send and receive actions have an empty 
intersection. We illustrate wait-freedom using a protocol modelled as two timed 
session types, each owning a set of clocks (with no shared clocks between types). 


Sc =!Command(x < 5, {x}).S¢ Ss =?Command(y < 5, {y}).96 (1) 


The protocol in (1) involves a client Sc with a clock x, and a server Ss with a 
clock y (with both «x and y initially set to 0). Following the protocol, the client 
must send a message of type Command within 5 min, reset x, and continue as S¢. 
Dually, the server must be ready to receive a command with a timeout of 5 min, 
reset y, and continue as Sg. The model in (1) is not wait-free: the intersection 
of the time-windows for the send and receive actions is non-empty (the time- 
windows actually coincide). The protocol in (2), where the server must wait until 
after the client’s deadline to read the message, is wait-free. 


!Command(x < 5, {x}).S¢ ?Command(y = 5, {y}).S¢ (2) 


Patterns like the one in (1) are common (e.g., the SMPT fragment mentioned 
at the beginning of this introduction) but, unfortunately, they are not wait-free, 
hence ruled out in previous work [12]. Arguably, (2) is an unpractical wait-free 
variant of (1): the client must always wait for at least 5 min to have the message 
read, no matter how early this message was sent. The definition of protocols 
for our typing system (which allows for not wait-free protocols) is based on a 
notion of asynchronous timed duality, and on a subtyping relation that provides 
accuracy of typing, especially in the case of channel passing. 


Asynchronous timed duality. In the untimed scenario, each session type has one 
unique dual that is obtained by changing the polarities of the actions (send vs. 
receive, and selection vs. branching). For example, the dual of a session type S 
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that sends an integer and then receives a string is a session type S that receives 
an integer and then sends a string. 


S =lInt.?String S =?Int.!String 


Duality characterises well-behaved systems: the behaviour described by the com- 
position of dual types has no communication mismatches (e.g., unexpected mes- 
sages, or messages with values of unexpected types) nor deadlocks. In the timed 
scenario, this is no longer true. Consider a timed extension of session types (using 
the model of time in timed automata [3]), and of (untimed) duality so that dual 
send/receive actions have equivalent time constraints and resets. The example 
below shows a timed type S with its dual S, where S owns clock x, and S owns 
clock y (with z and y initially set to 0): 


S =!Int(x <1,x).?String(x# <2) S =?Int(y < 1,y).!String(y < 2) 


Here S sends an integer at any time satisfying x < 1, and then resets x. After 
that, S receives a string at any time satisfying x < 2. The timed dual of S 
is obtained by keeping the same time constraints (and renaming the clock— 
to make it clear that clocks are not shared). To illustrate our point, we use 
the semantics from timed session types [12], borrowed from Communicating 
Timed automata [23]. This semantics is separated, in the sense that only time 
actions may ‘take time’, while all other actions (e.g., communications) are 
instantaneous.! The aforementioned semantics allows for the following execu- 
tion of S| S: 


S|S—>—5 ?String(x < 2) |S clocks values: x = 0, y = 0.4 
g 

cee ?String(x < 2) |!String(x < 2) (clocks values: x = 0.6, y = 0) 

ae ?String(x < 2) (clocks values: z = 2.6, y = 2) 


where: (i) the system makes a time step of 0.4, then S sends the integer and 
resets x, yielding a state where x = 0 and y = 0.4; (ii) the system makes a 
time step of 0.6, then S receives the integer and resets y, yielding a state where 
x = 0.6 and y = 0; (iii) the system makes a time step of 2, then the continuation 
of S$ sends the string, when y = 2 and x = 2.6. In (iii), the string was sent too 
late: constraint x < 2 of the receiving endpoint is now unsatisfiable. The system 
cannot do any further legal step, and is stuck. 


Urgent receive semantics. The example above shows that, in the timed asyn- 
chronous scenario, the straightforward extension of duality to the timed scenario 
does not necessarily characterise well-behaved communications. We argue, how- 
ever, that the execution of S | S, in particular the time reduction with label 
0.6, does not reflect the semantics of most common receive primitives. In fact, 
most mainstream programming languages implement urgent receive semantics 


1 Separated semantics can describe situations where actions have an associated 
duration. 


586 L. Bocchi et al. 


for receive actions. We call a semantics urgent receive when receive actions are 
executed as soon as the expected message is available, given that the guard of 
that action is satisfied. Conversely, non-urgent receive semantics allows receive 
actions to fire at any time satisfying the time constraint, as long as the message 
is in the queue. The aforementioned reduction with label 0.6 is permitted by 
non-urgent receive semantics such as the one in [23], since it defers the reception 
of the integer despite the integer being ready for reception and the guard (y < 2) 
being satisfied, but not by urgent receive semantics. Urgent receive semantics 
allows, instead, the following execution for S | S: 


S|S 2458 ostring(x < 2) | 5 (clocks values: £ = 0, y = 0.4) 
= ?String(x < 2) |!String(x < 2) (clocks values: x = 0, y = 0) 
Bs ?String(x < 2) (clocks values: x = 2, y = 2) 


If S sends the integer when x = 0.4, then S must receive the integer imme- 
diately, when y = 0.4. At this point, both endpoints reset their respective 
clocks, and the communication will continue in sync. Urgent receive primitives 
are common; some examples are the non-blocking WaitFreeReadQueue.read() 
and blocking WaitFreeReadQueue.waitForData() of Real-Time Java [13], and 
the receive primitives in Erlang and Golang. Urgent receive semantics make 
interactions “more synchronous” but still as asynchronous as real-life programs. 


A calculus for timed asynchronous processes. Our calculus features two time- 
sensitive primitives. The first is a parametric receive operation a”(b).P on a 
channel a, with a timeout n that can be oo or any number in Ryo. The para- 
metric receive captures a range of receive primitives: non-blocking (n = 0), 
blocking without timeout (n = oo), or blocking with timeout (n € Ryo). The 
second primitive is a time-consuming action, delay(d). P, where ô is a constraint 
expressing the time-window for the time consumed by that action. Delay pro- 
cesses model primitives like Thread.sleep(n) in real-time Java [13] or, more 
generally, any time-consuming action, with 6 being an estimation of the delay of 
computation. 

Processes in our calculus abstract implementations of protocols given as pairs 
of dual types. Consider the processes below. 


Po = delay(x < 3).@HELO.P4 Pg = delay(x = 5). a° (b).P4 Qs = a°(b).Q's 


Processes abiding the protocols in (2) could be as follows: Po for the client Sc, 
and Ps for the server Ss. The client process Po performs a time consuming action 
for up to 3min, then sends command HELO to the server, and continues as P6. 
The server process Pg sleeps for exactly 5 min, receives the message immediately 
(without blocking), and continues as Pg. A process for the protocol in (1) could, 
instead be the parallel composition of Po, again for the client, and Qs for the 
server. Process Qs uses a blocking primitive with timeout; the server now blocks 
on the receive action with a timeout of 5 min, and continues as Q’, as soon as 
a message is received. The blocking receive primitive with timeout is crucial 
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to model processes typed against protocols one can express with asynchronous 
timed duality, in particular those that are not wait-free. 


A type system for timed asynchronous processes. The relationship between types 
and processes in our calculus is given as a typing system. Well-typed processes 
are ensured to communicate at the times prescribed by their types. This result 
is given via Subject Reduction (Theorem 4), establishing that well-typedness is 
preserved by reduction. In our timed scenario, Subject Reduction holds under 
receive liveness, an assumption on the interaction structure of processes. This 
assumption is orthogonal to time. To characterise the interaction structures of a 
timed process we erase timing information from that processes (time erasure). 
Receive liveness requires that, whenever a time-erased processes is waiting for 
a message, the corresponding message is eventually provided by the rest of the 
system. While receive liveness is not needed for Subject Reduction in untimed 
systems [21], it is required for timed processes. This reflects the natural intuition 
that if an untimed-process violates progress, then its timed counterpart may miss 
deadlines. Notably, we can rely on existing behavioural checking techniques from 
the untimed setting to ensure receive liveness [17]. 

Receive liveness is not required for Subject Reduction in a related work on 
asynchronous timed session types [12]. The dissimilarity in the assumptions is 
only apparent; it derives from differences in the two semantics for processes. 
When our processes cannot proceed correctly (e.g., in case of missed deadlines) 
they reduce to a failed state, whereas the processes in [12] become stuck (indi- 
cating violation of progress). 


Synopsis. In Sect.2 we introduce the syntax and the formation rules for asyn- 
chronous timed session types. In Sect. 3, we give a modular Labelled Transition 
System (LTS) for types in isolation (Sect.3.1) and for compositions of types 
(Sect. 3.3). The subtyping relation is given in Sect. 3.2 and motivated in Example 
8, after introducing the typing rules. We introduce timed asynchronous duality 
and its properties in Sect. 4. Remarkably, the composition of dual timed asyn- 
chronous types enjoys progress when using an urgent receive semantics (Theo- 
rem 1). Section 5 presents a calculus for timed processes and Sect. 6 introduces its 
typing system. The properties of our typing system—Subject Reduction (The- 
orem 4) and Time Safety (Theorem 5)—are introduced in Sect. 7. Conclusions 
and related works are in Sect. 8. Proofs and additional material can be found in 
the online report [11]. 


2 Asynchronous Timed Session Types 


Clocks and predicates. We use the model of time from timed automata [3]. Let 
X be a finite set of clocks, let £1,..., £n range over clocks, and let each clock 
take values in Ryo. Let t1,...,t, range over non-negative real numbers and 
N1,-.-.,My range over non-negative rationals. The set G(X) of predicates over X 
is defined by the following grammar. 


d::=true|a>n|e=n|x-—y>n|x-—y=n|76|61Ad2 where z,y €X 
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We derive false, <, >, < in the standard way. Predicates in the form x—y >n 
and x — y = n are called diagonal predicates; in these cases we assume x Æ y. 
Notation cn(d) stands for the set of clocks in ô. 


Clock valuation and resets. A clock valuation v : X — Ryo returns the time of 
the clocks in X. We write v + t for the valuation mapping all x € X to v(x) + t, 
vo for the initial valuation (mapping all clocks to 0), and, more generally, vs for 
the valuation mapping all clocks to t. Let v = 6 denote that ô is satisfied by v. 
A reset predicate A over X is a subset of X. When A is @ then no reset occurs, 
otherwise the assignment for each x € A is set to 0. We write v [A + 0] for the 
clock assignment that is like v everywhere except that its assigns 0 to all clocks 
in À. 


Types. Timed session types, hereafter just types, have the following syntax: 
T :: = (ô, S) | Nat | Bool |... 


Sea: IT (6, A).S | ?T (0, A).S | @ {1i (bi, Ai) : Sibier | &{li (ði, Ai) : Si hier | 
aS | œa | end 


Sorts T include base types (Nat, Bool, etc.), and sessions (ô, S). Messages of 
type (ô, S) allow a participant involved in a session to delegate the remaining 
behaviour S; upon delegation the sender will no longer participate in the dele- 
gated session and receiver will execute the protocol described by S under any 
clock assignment satisfying 6. We denote the set of types with T. 

Type !T (6, A).S models a send action of a payload with sort T. The sending 
action is allowed at any time that satisfies the guard 6. The clocks in A are 
reset upon sending. Type ?T(6, A).S models the dual receive action of a payload 
with sort T. The receiving types require the endpoint to be ready to receive the 
message in the precise time window specified by the guard. 

Type @{1;(0;, Ai) : Si}ier is a select action: the party chooses a branch i € J, 
where J is a finite set of indices, selects the label l;, and continues as prescribed 
by §;. Each branch is annotated with a guard 6 and reset A. A branch j can 
be selected at any time allowed by 6;. The dual type is &{1,(0;,A:) : Sifier 
for branching actions. Each branch is annotated with a guard and a reset. The 
endpoint must be ready to receive the label for j at any time allowed by 6, (or 
until another branch is selected). 

Recursive type pa. associates a type variable a to a recursion body S. We 
assume that type variables are guarded in the standard way (i.e., they only occur 
under actions or branches). We let A denote the set of type variables. 

Type end models successful termination. 


2.1 Type Formation 


The grammar for types allow to generate types that are not implementable in 
practice, as the one shown in Example 1. 
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Example 1 (Junk-types). Consider S in (3) under initial clock valuation vo. 
S =?T (x <5,@).!T (x < 2,@).end (3) 


The specified endpoint must be ready to receive a message in the time-window 
between 0 and 5 time units, as we evaluate x < 5 in vo. Assume that this 
receive action happens when x = 3, yielding a new state in which: (i) the clock 
valuation maps « to 3, and (ii) the endpoint must perform a send action while 
x < 2. Evidently, (ii) is no longer possible in the new clock valuation, as the 
x < 2 is now unsatisfiable. We could amend (3) in several ways: (a) by resetting 
x after the receive action; (b) by restricting the guard of the receive action (e.g., 
x < 2 instead of x < 5); or (c) by relaxing the guard of the send action. All 
these amendments would, however, yield a different type. 


In the remainder of this section we introduce formation rules to rule out 
junk types as the one in Example 1 and characterise types that are well-formed. 
Intuitively, well-formed types allow, at any point, to perform some action in the 
present time or at some point in the future, unless the type is end. 


Judgments. The formation rules for types are defined on judgments of the form 
A; dK S 


where A is an environment assigning type variables to guards, and ô is a guard 
in G(X). A is used as an invariant to form recursive types. Guard 6 collects the 
possible ‘pasts’ from which the next action in S could be executed (unless S$ is 
end). We use notation | ô (the past of 6) for a guard 6’ such that v — 6’ if and 
only if Jt : v + t = ô. For example, | (1 <a < 2) = x <2 and | (x > 3) = true. 
Similarly, we use the notation 6[A +> 0] to denote a guard in which all clocks in 
A are reset. For example, (x < 3A y <S 2)[4 => 0] = (x =OAy < 2). We use the 
notation 6; € d2 whenever, for all v, v = 64 v — 69. The past and reset of 
a guard can be inferred algorithmically, and © is decidable [8]. 


TT d 
A; true + end ead 


€ {1,? A; yk S A= 0] Sx T base type ,. 
c3 Alir a aS erase 
E {!,?} Ay SS OAR Cy T= (0,9) 
yrs oy 
A 1A OTGA)S [etapat] 
E{9,&} Wiel AwES üi 0] Ey [choice] 
A; | Ver Ho O {1(6i, Ai) : Sifier 


[var] 


A,a:6; dF a 
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Rule [end] states that the terminated type is well-formed against any A. 
The guard of the judgement is true since end is a final state (as end has no 
continuation, morally, the constraint of its continuation is always satisfiable). 
Rule [interact] ensures that the past of the current action 6 entails the past 
of the subsequent action y (considering resets if necessary): this rules out types 
in which the subsequent action can only be performed in the past. Rules [end] 
and [interact] are illustrated by the three examples below. 


Example 2. The judgment below shows a type being discarded after an applica- 
tion of rule [interact] : 


Ø; r L3 ?Nat(1 <z <3,@).!Nat(1<a< 2,@).end (4) 


The premise of |interact] would be ô | y, which does not hold for ô = 1 < 
x < 3 and | y = x < 2. This means that guard (1 < x < 3, Ø) of the first 
action may lead to a state in which guard 1 < x < 2 for the subsequent action 
is unsatisfiable. If we amend the type in (4) by adding a reset in the first action, 
we obtain a well-formed type. We show its formation below, where for simplicity 
we omit obvious preconditions like Nat base type, etc. 


d 
Ø; true + end [etid] 


Ø; x <2 'Nat(1 < z < 2, Ø).end x=0 
Ø; xr <3 H?Nat(1 <a <3, {x})!Nat(1 < z < 


Į] <x < 2c true 


[interact] 


[interact] 


Rule [delegate] behaves as [interact] , with two additional premises on 
the delegated session: (1) S’ needs to be well-formed, and (2) the guard of the 
next action in S’ needs to be satisfiable with respect to 6’. Guard 6’ is used to 
ensure a correspondence between the state of the delegating endpoint and that 
of the receiving endpoint. Rule [choice] is similar to [interact] but requires 
that there is at least one viable branch (this is accomplished by considering the 
weaker past | V;czô:) and checking each branch for formation. Rules [rec] and 

[var] are for recursive types and variables, respectively. In [rec] the guard 6 
can be easily computed by taking the past of the next action of the in S (or 
the disjunction if S is a branching or selection). An algorithm for deciding type 
formation can be found in [11]. 


Definition 1 (Well-formed types). We say that S is well-formed against 
clock valuation v if Ø; 6 | S and v = 6, for some guard 6. We say that S is 
well-formed if it is well formed against vo. 


We will tacitly assume types are well-formed, unless otherwise specified. The 
intuition of well-formedness is that if A; ô / S then S can be run (using the 
types semantics given in Sect.3) under any clock valuation v such that v = ô. 
In the sequel, we take (well-formed) types equi-recursively [31]. 
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3 Asynchronous Session Types Semantics and Subtyping 


We give a compositional semantics of types. First, we focus on types in isolation 
from their environment and from their queues, which we call simple type con- 
figurations. Next we define subtyping for simple type configurations. Finally, we 
consider systems (i.e., composition of types communicating via queues). 


Z = : [snd] 7 = 2 [rev] 
(v, !T (ô, A)).S —> (v [àA > 0], S) (v, ?T (6, A).S) —> (v[A 0], S) 
V = 0; j E€ I [sel] 
(v, @{1i(5:, Xs) : Si}ier) = (v [Ay > 0], S) 
V = 6; J Ee F [bra] 


(v, &{1i(6, Ai) : Si}ier) > (w [Ay > 0], Sy) 


(v, S[ut.S/t]) => 0, 9’) 
(v, ut. S) = (v’, S’) 


[rec]  (v, S) —> (v + t, S) [time] 
Fig. 1. LTS for simple type configurations 


3.1 Types in Isolation 


The behaviour of simple type configurations is described by the Labelled Transi- 
tion System (LTS) on pairs (v, S) over (V x S), where clock valuation v gives the 
values of clocks in a specific state. The LTS is defined over the following labels 


:=lm|?m | t |r mi=d | 1 


Label !m denotes an output action of message m and ?m an input action of m. 
A message m can be a sort T (that can be either a higher order message (ô, S) 
or base type), or a branching label 1. The LTS for single types is defined as the 
least relation satisfying the rules in Fig. 1. Rules [snd], [rev], [sel], and [bra] can 
only happen if the constraint of the next action is satisfied in the current clock 
valuation. Rule [rec] unfolds recursive types, and [time] always lets time elapse. 

Let s, s’, s; (i € N) range over simple type configurations (v, S5). We write 


£ : £ . te t æ 
s —> when there exists s’ such that s — s’, and write s —> for s —+—>. 


3.2 Asynchronous Timed Subtyping 


We define subtyping as a partial relation on simple type configurations. As in 
other subtyping relations for session types we consider send and receive actions 
dually [14,16,19]. Our subtyping relation is covariant on output actions and 
contra-variant on input actions, similarly to that of [14]. In this way, our sub- 
typing S <: S” captures the intuition that a process well-typed against S can be 
safely substituted with a process well-typed against S”. Definition 2, introduces 
a notation that is useful in the rest of this section. 
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Definition 2 (Future enabled send/receive). Action ¢ is future enabled in 


sifit:s +! We write s > (resp. s =) if there exists a sending action !m 


(resp. a receiving action ?m) that is future enabled in s. 


As common in session types, the communication structure does not allow for 
mixed choices: the grammar of types enforces choices to be either all input 
(branching actions), or output (selection actions). From this fact it follows that, 


given s, reductions s S and s > cannot hold simultaneously. 

Definition 3 (Timed Type Simulation). Fir sı = (rı, Sı) and sg = 
(v2, S2). A relation R € (V x S}? is a timed type simulation if (si,s2) E€ R 
implies the following conditions: 


1. Sı = end implies S2 = end 
t!my lo: s 52 . time ! ! ! 
2. sı —> si implies 4s5,mz2 : s2 —> s, (m2,m1) E S,(s},85) ER 


3. S2 = sh implies Js1, mı ‘S1 Liri si, (mı, m2) ES, (s1, ER 

4. sı > implies so > and s2 => implies sı > 

where S is the following extension of R to messages: (1) (T,T’) € S if T 
and T’ are base types, and T’ is a subtype of T by sorts subtyping, e.g., 
(int, nat) € S; (2) (1,1) € S; (3) ((61, S1), (42, S2)) € S, if Vy = 61 dv, ia 
ô2 : ((1, S1), (v2, S2)) E R and Vi 69 Avy, ôi : ((11, S1), (v2, S2)) ER. 
Intuitively, if (si,s2) E€ R then any environment that can safely interact with 


S2, can do so with sı. We write that sg simulates sı whenever sı and s2 are in 
a timed type simulation. Below, sz simulates s4: 


sı = (vo, !'nat(z < 5, Ø).end) s2 = (vo, !'int(x < 10, @).end) 
Conversely, sı does not simulate sz because of condition (2). Precisely, s2 can 
make a transition Sə Tois that cannot be matched by sı for two reasons: guard 
x < 5 is no longer satisfiable when x = 10, and (nat, int) ¢ S since int is not 
a subtype of nat. For receive actions, instead, we could substitute s with s’ if 


s’ had at least the receiving capabilities of s. Condition (4) in Definition 3 rules 
out relations that include, e.g., ((v, ?T(true, @).end), (v, !T (true, @).end)). 


Live simple type configurations. In our subtyping definition we are interested in 
simple type configurations that are not stuck. Consider the example below: 
(v,!Int(a < 10, @).end) (5) 


The simple type configuration in (5) would not be stuck if v = vo, but would 
be stuck for any v = v'[x + 10]. Definition 4 gives a formal definition of simple 
type configurations that are not stuck, i.e., that are live. 


Definition 4 (Live simple type configuration). A simple configuration 
(v, S) is said live if: 
tom 


S=end or Jt,£l: (v, S) —> (o € {!, ?}) 
Observe that for all well-formed S, (vo, S) is live. 


Asynchronous Timed Session Types 593 


Subtyping for simple type configurations. We can now define subtyping for simple 
type configurations and state its decidability. 


Definition 5 (Subtyping). sı is a subtype of S2, written sı <: So, if there 
exists a timed type simulation R on live simple type configurations such that 
(si,S2) E R. We write Sı <: S2 when (vo, S1) <: (vo, S2). Abusing the notation, 
we write m <: m iff there exists S such that (m, m’) € S. 


Subtyping has been shown to be decidable in the untimed setting [19] and 
in the timed first order setting [6]. In [6], decidability is shown through a reduc- 
tion to model checking of timed automata networks. The result in [6] can be 
extended to higher-order messages using the techniques in [3], based on finite 
representations (called regions) of possibly infinite sets of clock valuations. 


Proposition 1 (Decidability of subtyping). Checking if (61,51) <: (62, S2) 
is decidable. 


3.3 Types with Queues, and Their Composition 


As interactions are asynchronous, the behaviour of types must capture the states 
in which messages are in transit. To do this, we extend simple type configurations 
with queues. A configuration S is a triple (v, S,M) where v is clock valuation, S 
is a type and M a FIFO unbounded queue of the following form: 


M:=@ | mM 


M contains the messages sent by the co-party of S and not yet received by S. We 
write M for M; Ø, and call (v, S,M) initial if v = v and M = Ø. 


Composing types. Configurations are composed into systems. We denote S | S’ 
as the parallel composition of the two configurations S and S’. 

The labelled transition rules for systems are given in Fig.2. Rule (snd) is 
for send actions. A send action can occur only if the time constraint of S' is 
satisfied (by the premise, which uses either rule [snd] or [sel] in Fig. 1). Rule 
(que) models actions on queues. A queue is always ready to receive any message 
m. Rule (rev) is for receive actions, where a message is read from the queue. A 
receiving action can only occur if the time constraint of S is satisfied (by the 
premise, which uses either rule [rcv] or [bra] in Fig. 1). The message is removed 
from the head of the queue of the receiving configuration. The third clause in 
the premise uses the notion of subtyping (Definition 3) for basic sorts, labels, 
and higher order messages. Rule (crev) is the action of a configuration pulling a 
message of its queue. Rule (com) is for communication between a sending con- 
figuration and a buffer. Rule (ctime) lets time elapse in the same way for all 
configurations in a system. Rule (time) models time passing for single configu- 
rations. Time passing is subject to two constrains, expressed by the second and 
third conditions in the premise. Condition (v, 5) > requires the time action t 
to preserve the satisfiability of some send action. For example, in configuration 
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E 7 = mer (snd) (v, S.M) ŽB (v, S, M; m) (que) 
V, 9, => 5 ’ 
Im! 7 or on 1. Z, g 
(v, S) (v Le. = <:m Gev) Sı - = Cn 
(v, S, m; M) — (v’, S’,M) Sı | S2 — S; | S2 
Im f 2m ? t 1 
Si Sı = lea = (com) pie ee 7 S2 — 82 (ctime) 
Sı | S2 — S; | S; Sı | S2 —> S; |S) 


(v,8) + (v',8) (m, S) S implies (v, S) > W <t: (V+ t',S,M)> 
(v, S,M) —> (v’, S,M) 


(time) 
Fig. 2. LTS for systems. We omit the symmetric rules of (crev), and (csnd). 


(v,!T (a < 2,@).S,@), a transition with label 2 would not preserve any send 
action (hence would not be allowed), while a transition with label 1.8 would 
be allowed by condition (v, S) S., Condition Vt! < t: (v+t',S,M)* in the 
premise of rule (time) checks that there is no ready message to be received in 
the queue. This is to model urgency: when a configuration is in a receiving state 
and a message is in the queue then the receiving action must happen without 
delay. For example, (vo, ?T (a < 2,@).S,@) can make a transition with label 1, 
but (vo, ?T (a < 2,@).S,m) cannot make any time transition. Below we show 
two examples of system executions. Example 3 illustrates a good communica- 
tion, thanks to urgency. We also illustrate in Example 4 that without an urgent 
semantics the system in Example 3 gets stuck. 


Example 3 (A good communication). Consider the following types: 
Sı =!T (x < 1,2).?T (x < 2).end S2 =?T (y < 1, y) !T (y < 2).end 


System (v[a + 0], S51, Ø) | (via + 0], S2, Ø) can make a time step with label 
0.5 by (ctime), yielding the system in (6) 


(v[z = 0.5], $1, Ø) | (vx — 0.5], S2, Ø) (6) 


The system in (6) can move by a7 step thanks to (com): the left-hand side 
configuration makes a step with label !T by (snd) while the right-hand side 
configuration makes a step ?T by (que), yielding system (7) below. 


(vz = 0], ?T(x < 2).end, @) | (v[y + 0.5], So, T) (7) 


The right-hand side configuration in the system in (7) must urgently receive 
message T due to the third clause in the premise of rule (time). Hence, the only 
possible step forward for (7) is by (crcv) yielding the system in (8). 


(vjz => 0], ?T (x < 2).end, Ø) | (v[y 0], !T (y < 2).end, Z) (8) 
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Example 4 (In absence of urgency). Without urgency, the system in (7) from 
Example 3 may get stuck. Assume the third clause of rule (time) was removed: 
this would allow (7) to make a time step with label 0.5, followed by a step by 
(rev) yielding the system in (9), where clock y is reset after the receive action. 


(vir = 0.5], 27 (a < 2).end, Ø) | (Vly 0}, !T(y <2).end,@) (9) 
followed by a 7 step by (com) reaching the following state: 
(v[a => 2.5], ?T(x < 2).end, T) | (v[y => 0], end, @) (10) 


The message in the queue in (10) will never be received as the guard x < 2 is not 
satisfiable now or at any point in the future. This system is stuck. Instead, thanks 
to urgency, the clocks of the configurations of system (8) have been ‘synchronised’ 
after the receive action, preventing the system from getting stuck. 


4 Timed Asynchronous Duality 


We introduce a timed extension of duality. As in untimed duality, we let 
each send/select action be complemented by a corresponding receive/branching 
action. Moreover, we require time constraints and resets to match. 


Definition 6 (Timed duality). The dual type S of S is defined as follows: 


!T'(6, A).S =?T'(6, A).S ?T (6, A).S =!T (6, A).S pa.S = pa.S 
{li(5;, Ai) : Sihier = &{li (ði, Ai) : Shier a=a 
&{li(6;, Ai) : Sihier = O{1i(6i, Ai) : Shier end = end 


Duality with urgent receive semantics enjoys the following properties: sys- 
tems with dual types fulfil progress (Theorem 1); behaviour (resp. progress) of 
a system is preserved by the substitution of a type with a subtype (Theorem 2) 
(resp. Theorem 3). A system enjoys progress if it reaches states that are either 
final or that allow further communications, possibly after a delay. Recall that 
we assume types to be well-formed (cf. Definition 1): Theorems 1, 2, and 3 rely 
on this assumption. 


Definition 7 (Type progress). We say that a system (v, S,M) is a success if 
S = end and M = Ø. We say that Sı | S2 satisfies progress if: 


= tt 
Sı | S2 —* S1 |S} => Sj and Sb}, are success or Jt : 8 | S1 — 


Theorem 1 (Duality progress). System (vo, S, Ø) (%, S, Ø) enjoys 
progress. 


We show that subtyping does not introduce new behaviour, via the usual 
notion of timed simulation [1]. Let c,c1,C2 range over systems. Fix cı = 
(vi, St, Mt) | (v4, S3,Md), and co = (v?, S?,M?) | (v2, S3,M3). We say that a binary 
relation over systems preserves end if: Si = end A M = Ø iff S$ = end A MÉ = Ø 
for all i € {1,2}. Write cy S c2 if (c1, c2) are in a timed simulation that preserves 
end. 
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Theorem 2 (Safe substitution). If S’ <: S, then (vo, 9, Ø) | (vo, S, Ø) S 
(vo, S, Ø) | (vo, S, Ø). 


Theorem 3 (Progressing substitution). If S <: S, then (w, S, Ø) | 
(vo, S', Ø) satisfies progress. 


5 A Calculus for Asynchronous Timed Processes 


We introduce our asynchronous calculus for timed processes. The calculus 
abstracts implementations that execute one or more sessions. We let P, P’,Q,... 
range over processes, X range over process variables, and define n € Ryo U {oo}. 
We use the notation a for ordered sequences of channels or variables. 


Pi=Gv.P |  delay(d).P (time-consuming) 
a<l.P | a%(b).P 
if v then P else P | a” > {li : Pi her 
P|P | failed (run-time) 
a . |  delay(t).P 
def Din P 
X(a; a) D:=X(a;a)=P 
(vab)P 
ab: h h:=@|h-vih-a 


av.P sends a value v on channel a and continues as P. Similarly, a<1.P 
sends a label 1 on channel a and continue as P. Process if v then P else Q 
behaves as either P or Q depending on the boolean value v. Process P | Q is 
for parallel composition of P and Q, and 0 is the idle process. def D in P is 
the standard recursive process: D is a declaration, and P is a process that may 
contain recursive calls. In recursive calls X (a ; a) the first list of parameters has 
to be instantiated with values of ground types, while the second with channels. 
Recursive calls are instantiated with equations X(a ; a) in D. Process (vab)P 
is for scope restriction of endpoints a and b. Process ab : h is a queue with name 
ab (colloquially used to indicate that it contains messages in transit from a to 
b) and content h. (vab) binds endpoints a and b, and queues ab and ba in P. 

There are two kind of time-consuming processes: those performing a time- 
consuming action (e.g., method invocation, sleep), and those waiting to receive a 
message. We model the first kind of processes with delay(d). P, and the second 
kind of processes with a”(b). P (receive) and a” > {l; : Pijer (branching). In 
delay(6). P, 6 is a constraints as those defined for types, but on one single clock 
x. The name of the clock here is immaterial: clock x is used as a syntactic tool 
to define intervals for the time-consuming (delay) action. In this sense, assume 
x is bound in delay(d). P. Process delay(d). P consumes any amount of time t 
such that t is a solution of ô. For example delay(« < 3). P consumes any value 
between 0 to 3 time units, then behaves as P. Process a"(b). P receive a message 
on channel a, instantiates b and continue as P. Parameter n models different 
receive primitives: non-blocking (n = 0), blocking (n = oo), and blocking with 
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timeout (n € R?°). If n € R7° and no message is in the queue, the process 
waits n time units before moving into a failed state. If n is set to oo the process 
models a blocking primitive without timeout. Branching process a” > {1; : P; hier 
is similar, but receives a label l; and continues as P;. 

Run-time processes are not written by programmers and only appear upon 
execution. Process failed is the process that has violated a time constraint. 
We say that P is a failed state if it has failed as a syntactic sub-term. Process 
delay(t). P delays for exactly t time units. 


Well-formed processes. Sessions are modelled as processes of the following form 
(vab)(P | ab: h | ba: h’) 


where P is the process for endpoints a and b, ab is the queue for messages from a 
to b, and ba is the queues for messages from b to a. A process can have more than 
one ongoing session. For each, we expect that all necessary queues are present 
and well-placed. We ensure that queues are well-placed via a well-formedness 
property for processes (see [11] for an inductive definition). Well-formedness 
rules out processes of the following form: 


(vab) (a” (c). (ba: h’ | P)| Q| ab: h) (11) 


The process in (11) in not well-formed since queue ba for communications to 
endpoint a is not usable as it is in the continuation of the receive action. 
Well-formedness of processes is necessary to our safety results. We check well- 
formedness orthogonally to the typing system for the sake of simpler typing rules. 
While well-formedness ensures the absence of misplaced queues, the presence of 
an appropriate pair of queues for every session is ensured by the typing rules. 


Session creation. Usually well-formedness is ensured by construction, as sessions 
are created by a specific (synchronous) reduction rule [10,21]. This kind of session 
creation is cumbersome in the timed setting as it allows delays that are not 
captured by protocols, hence well-typed processes may miss deadlines. Other 
work on timed session types [12] avoids this problem by requiring that all session 
creations occur before any delay action. Our calculus allows session to be created 
at any point, even after delays. In (12) a session with endpoints c and d is created 
after a send action (assume P includes the queues for this new session). 


(vab) (@v.delay(x < 3).(vcd)(P) | Q | ab: h | ba: k’) (12) 


A process like the one in (12) may be thought as a dynamic session creation 
that happens synchronously (as in [10,21]), but assuming that all participants 
are ready to engage without delays. Our approach yields a simplification to 
the calculus (syntax and reduction rules) and, yet, a more general treatment of 
session initiation than the work in [12]. 
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P — P P w P 
Po P PpP, P [Red1/Red2] 
av.P|ab:h — Plab:h-v [Send] 
a"(c).P|ba:uv-h — Plv/c]|ba:h [Rev] 
a<al.P|ab:h — Plab:h-1 [Se1] 
a” œ> {l;i : Pihier | ba: 1j;-h — Pj | ba:h (jel) [Bra] 
= [t/a] 
delay(ô). P — delay(t).P [Det] 
if true then P else Q — P [IfT] 
P — P P= P 
PIQ > P|Q dfDinP — def DinP’ perder] 
def X(a’; b') = P' in Xw; bd) |Q = 
def X(a' ; b') = P’ in P'[v,b/a',b']| Q [Rec] 
P=P pP = Q' Q =Q Pa P 
P Tr = Gar [AStr/AScope] 
— pl 4 1 Fa 
Lar D = $ Q =Q P ~~ (P) [TStr/Delay] 


Fig. 3. Reduction for processes (rule [IfF], symmetric for [IfT] is omitted). 


(0) =0 (ab: h) =ab:h (failed) = failed 


®,(P, | P2) = :(P1) | &:(P2), if Wait(P;) ^ NEQueue(P)) = Ø, i + j e {1,2} 


@,(delay(t’). P) =delay(t’—t).P ift >t 


i Yta). P ift St 
@,(a* (a’). P) = ae) ; . 
failed otherwise 
Bila” (a'). P) = a® (a'). P 


®,((vab)P) = (vab) (P) 


(def D in P) = def D in &,(P) 


Fig. 4. Time-passing function ®+(P). Rule for at > {l : Pi}ier is omitted for brevity. 
¢¢(P) is undefined in the remaining cases. 
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Reduction for processes. Processes are considered modulo structural equivalence, 
denoted by =, and defined by adding the following rule for delays to the standard 
ones [28]: delay(0). P = P. Reduction rules for processes are given in Fig. 3. A 
reduction step —> can happen because of either an instantaneous step — by 
[Red1] or time-consuming step ~~ by [Red2]. Rules [Send], [Rev], [Sel], and [Bra] 
are the usual asynchronous communication rules. Rule [Det] models the random 
occurrence of a precise delay t, with t being a solution of 6. The other untimed 
rules, [IfT], [Par], [Def], [Rec], [AStr], and [AScope] are standard. Note that rule 
[Par] does not allow time passing, which is handled by rule [Delay]. Rule [TStr] 
is the timed version of [AStr]. Rule [Delay] applies a time-passing function ®; 
(defined in Fig. 4) which distributes the delay t across all the parts of a process. 
(P) is a partial function: it is undefined if P can immediately make an urgent 
action, such as evaluation of expressions or output actions. If ®,(P) is defined, 
it returns the process resulting from letting t time units elapse in P. &;(P) may 
return a failed state, if delay t makes a deadline in P expire. The definition 
of ®;(P, | P2) relies on two auxiliary functions: Wait(P) and NEQueue(P) (see 
[11] for the full definition). Wait(P) returns the set of channels on which P (or 
some syntactic sub-term of P) is waiting to receive a message/label. NEQueue(P) 
returns the set of endpoints with a non-empty inbound queue. For example, 
Wait(a‘(b).Q) = Wait(a’ > {l; : P}ier) = {a} and NEQueue(ba : h) = {a} given 
that h + Ø. ®(P, | P2) is defined only if no urgent action could immediately 
happen in P, | P2. For example, ®;(P; | P2) is undefined for P, = a‘(b).Q and 
Py = ba: v. 

In the rest of this section we show the reductions of two processes: one with 
urgent actions (Example 5), and one to a failed state (Example 6). We omit 
processes that are immaterial for the illustration (e.g., unused queues). 


Example 5 (Urgency and undefined B4). We show the reduction of process P = 
(vab)(a‘Hi’.Q | ab: Ø | b'°(c). P’) that has an urgent action. Process P can 
make the following reduction by [Send]: 


P — (vab)(Q | ab: ‘Hi’ | b!°(c). P’) 


At this point, to apply rule [Delay], say with t = 5, we need to apply the time- 
passing function as shown below: 


®5((vab)(a‘Hi’.Q | ab : ‘Hi’ | bt? (c). P’)) = (vab) (@‘Hi’.Q | s (ab : ‘Hi’ | bt? (c). P’)) 


which is undefined. &;(ab : Ø | bt? (c). P’) is undefined because Wait (b'°(c). P)a 
NEQueue(ab : ‘Hi’) = {b} + Ø. Since ®5(P’) is undefined. Instead, the message 
in queue ab can be received by rule [Rev]: 


(vab)(Q | ab: ‘Hi’ | bt? (c). P) — (vab)(Q | ab: Ø | P[‘Hi’/c}) 


Example 6 (An execution with failure). We show a reduction to a failing state of 
a process with a non-blocking receive action (expecting a message immediately) 
composed with another process that sends a message after a delay. 
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delay(x = 3).a‘Hi’.Q | ab: Ø | b°(c). P apply [Det] 
— delay(3).a‘Hi’.Q | ab: Ø | b°(c). P= P! apply [Delay] with t = 3 
= @3(P’) 


The application of the time-passing function to P’ yields a failing state (a mes- 
sage is not received in time) as shown below, where the second equality holds 
since Wait(b°(c). P) A NEQueue(ab: Ø) = Ø: 


@3(delay(3).a‘Hi’.Q | b°(c). P | ab: Ø) = 
3(delay(3).a@‘Hi’.Q) | Ba? (c). P | Bz(ab : @)) = 
delay(0).a@‘Hi’.Q | failed | ab: Ø 


po ek 


6 Typing for Asynchronous Timed Processes 


We validate programs against specifications using judgements of the form I” + 
Pc A. Environments are defined as follows: 


A:=@|A,a:(v,S)|A,ab:M O:=@|OU{A} 
r = Ø]|T,a:T|T,X :(T;0) 


Environment A is a session environment, used to keep track of the ongoing 
sessions. When A(a) = (v, S) it means that the process being validated is acting 
as a role in session a specified by S, and v is the clock valuation describing a 
(virtual) time in which the next action in S' may be executed. We write dom(A) 
for the set of variables and channels in A. Environment I’ maps variables a to 
sorts T and process variables X to pairs (T; ©), where T is a vector of sorts 
and © is a set of session environments. The mapping of process variable is used 
to type recursive processes: T is used to ensure well-typed instantiation of the 
recursion parameters, and O is used to model the set of possible scenarios when 
a new iteration begins. 


Notation, assumptions, and auxiliary definitions. We write A + t for the session 
environment obtained by incrementing all clock valuations in the codomain of 
A by t. 


Definition 8. We define the disjoint union AW B of sets of clocks A and B as: 
AW B= {in (ax) | a € A}U {in,(x) | x € B} 


where inj and in, are one to one endofunctions on clocks and, for alla € A and 
y € B, in (x) Æ in,(y). With an abuse of notation, we define the disjoint union 
of clock valuations 11, v2, in symbols vı W v2, as a clock valuation satisfying: 


vı © V2(ini(x)) = v(x) vı © valin, (£)) = v(x) 


We use the symbol 4) for the iterate disjoint union. 
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For a configuration (v, S) we define val((v,S)) = v, and type((v,S)) = S. We 
overload function val to session environments A as follows: 


val(A)= [J val(A(a)) 
a€dom(A) 
We require O to satisfy the following three conditions: 


1. If A € O and A(a) = (v, S), then S is well-formed (Definition 1) against v; 
2. For all A; € O, Ao € O: type(Aj(a)) = S iff type(42(a)) = S; 
3. There is guard 6 such that: 


{|v H 6} = [J val(A). 


Aco 


The last condition ensures that O is finitely representable, and is key for decid- 
ability of type checking. 


Example 7. We show some examples of O that do or do not satisfy the last 
requirement above. Let Sı =!T (x < 2).end and S2 =!T (y < 2).end, and let: 


). 
0, = {4 | A(a) = (%, S1) A A(d) = (v2, S2) A vi (x) < 2A M(x) = v2(y)}; 
O2 = {4 | A(a) = (r1, $1) A A(b) = (v2, S2) A vi (x) < V2 A v(x) = v2(y)}; 
O; = {4 | A(a) = (v1, S1) A A(b) = (v2, S2) A vı (x£) + v2(y) = 2}. 


We have that O; satisfies condition (3): let 6, = x < 2 ^ y — x = 0. It is easy to 
see that {v | v F 61} = Unce val (A). For O2, a candidate proposition would 
be dg = x < V2A y — x = 0. However, 62 can not be derived with the syntax of 
propositions, as V2 is irrational. Indeed, O> does not satisfy the condition. For 
Oz, let 63 = x+y = 2. Again, 63 is not a guard, as additive constraints in the 
form x+y = n are not allowed. Indeed, also ©3 does not satisfy the condition. 


In the following, we write a: T for a, : Ty,...,a@n : Tn when a = a1,..., an and 
T=T},...,T, (assuming a and T have the same number of elements). Similarly 
for b : (v, S). In the typing rules, we use a few auxiliary definitions: Definition 9 
(t-reading A) checks if any ongoing sessions in a A can perform an input action 
within a given timespan, and Definition 10 (Compatibility of configurations) 
extends the notion of duality to systems that are not in an initial state. 


Definition 9 (t-reading A). Session environment A is t-reading if there exist 
2m 


some a E€ dom(A), t <t and m such that: A(a) = (v, S) A (v + t', S) —> 
Namely, A is t-reading if any of the open sessions in the mapping prescribe a 
read action within the time-frame between v and v + t. Definition 9 is used in 
the typing rules for time-consuming processes — |Vrcv], [Drev], and [Delt] — to 
‘disallow’ derivations when a (urgent) receive may happen. 


Definition 10 (Compatibility of configurations). Configuration (n, 
S1,M1) is compatible with (v2, S2,M2), written (11, $1,M,)L(v2, S2,M2), if: 
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1. M = Ø VM = Ø, 

2. Vi + j € {1,2} : M = mM => Ivi, Si, m : (vi, Si) ua (wi, S Am <: 
m! A (Vis Si M) (v5, Sj, Mj), ae 

3. Mı = Ø AM = Ø > n = n A S1 = So. 


By condition (3) initial configurations are compatible when they include dual 
types, i.e., (vo, S, Ø)L(vo, S, Ø). By condition (2) two configurations may tem- 
porarily misalign as execution proceeds: one may have read a message from 
its queue, while the other has not, as long as the former is ready to receive it 
immediately. Thanks to the particular shape of type’s interactions, initial con- 
figurations — of the form (v, S, Ø)L(ro, S, Ø) — will only reach systems, say 
(vi, S1, M,)L (v2, S2,M2), in which at least one between M, and Mz is empty. Con- 
dition (1) requires compatible configurations to satisfy this basic property. 


Typing rules. The typing rules are given in Fig.5. Rule [Vrcv] is for input 
processes. The first premise consists of two conditions requiring the time-span 
[v, v +n] in which the process can receive the message to coincide with 6: 


— v +t H ô => t < n rules out processes that are not ready to receive a message 
when prescribed by the type. 
t < n => v +t H ô requires that a” (b). P can read only at times that satisfy 
the type prescription 6.” 


The second premise of [Vrcv] requires the continuation P to be well-typed against 
the continuation of the type, for all possible session environments where the 
virtual time is somewhere between |v, v +n], where the virtual valuation v in the 
mapping of session a is reset according to A. Rule [Drev], for processes receiving 
delegated sessions, is like [Vrcv] except: (a) the continuation P is typed against 
a session environment extended with the received session S’, and (b) the clock 
valuation v’ of the receiving session must satisfy 6’. Recall that by formation 
rules (Sect. 2.1) S’ is well-formed against all v’ that satisfy 6’. 

Rule [Vsend] is for output processes. Send actions are instantaneous, hence 
the type current v needs to satisfy 6. As customary, the continuation of the 
process needs to be well-typed against the continuation of the type (with v 
being reset according to A, and I’ extended with information on the sort of 
b). [Dsend] for delegation is similar but: (a) the delegated session is removed 
from the session environment (the process can no longer engage in the delegated 
session), and (b) valuation v’ of the delegated session must satisfy guard 0’. 

Rule [De1d] checks that P is well-typed against all possible solutions of 6. 
Rule [Delt] shifts the virtual valuations in the session environment of t. This is 
as the corresponding rule in [12] but with the addition of the check that A is 
not t-reading, needed because of urgent semantics. 

Rule [Res] is for processes with scopes. 


? While not necessary for our safety results, this constraint simplifies our theory. Tim- 
ing variations between types and programs are all handled in one place: rule [Subt]. 


Asynchronous Timed Session Types 603 


Rule [Rec] is for recursive processes. The rule is as usual [21] except that 
we use a set of session environments O (instead of a single A) to capture a set 
of possible scenarios in which a recursion instance may start, which may have 
different clock valuations. Rule [Var] is also as expected except for the use of O. 

Rules [Par] and [Subt] straightforward. 


Example 8 (Typing with subtyping). Subtyping substantially increases the 
power of our type system, in particular in the presence of channel passing. Intu- 
itively, without subtyping, the type of any higher-order send action should be an 
equality constraint (e.g., x = 1) rather than more general timeout (e.g., x < 1). 
We illustrate our point using P defined below: 


P= (vaıbı)(vazb2) (Pı | Pz | Ps | Q) Pı = delay(x < 1). T1 a2 
P> =b! (c). (d) P = delay(1 < x Az < 2). bz true 


where Q contains empty queues of the involved endpoints. Intuitively, P proceeds 
as follows: (1) Pı sends channel az to Pz within one time unit, and terminates; 
(2) Pz reads the message as soon as it arrives, and listens for a message across the 
received channel (a2) for two time units; (3) P3 sends value true through channel 
bə at a time in between 1 and 2, unaware that now she is communicating with 
P>, and then terminates; (4) P reads the message immediately and terminates. 
See below for one possible reduction: 


). c(d) | delay(0 < z A z <1). by true) | Q) 
delay(0.5). by true | Q) 
bə true | Q) 


P —* (vaıbı)(vazb2) (T1 az | b9 (c 
—* (vayb1)(va2b2) (0 | ald ) | 

— (vayb1)(vazb2)(0 | az” (d) | 

—* (vaıbı)(vazb2)(0|0 |0| Q) 

Although P executes correctly, the involved processes are well-typed against 

types that are not dual: 


H Pı œa : (vo, S1),a2 : (vo, S2) H Peedi: (vo, S1) H Pb: (vo, S2) 


for Sı =!(y <S 1, S2)(x < 1), S2 =?Boo1 (1 < y Ay <S 2), S1 =? (y = 0, S$)(a < 1). 
In order to ti check P, we need to apply rule [Res], requiring endpoints of the 
same session to have dual types. But clearly: Si + Sı. Without subtyping, P 
would not be well-typed. By subtyping, however, (y < 1, S2) <: (y = 0, S$) with 


S3 =?Bool(y < 2).end, and then S} <: 51. Thanks to the subtyping rule [subt] 
we can derive | P,>6,: (vo, S1) and, in turn, | Peg. 


7 Subject Reduction and Time Safety 


The main properties of our typing system are Subject Reduction and Time 
Safety. Time Safety ensures that the execution of well-typed processes will only 
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Vt: vttk&é =e t<n 
Vt<n: T,b:T | PeA+t+t,a:(v+t[Ar dO], S) A not treading 
Ir } a”(b). P > 4A,a: (v, ?T(6,A).S) 


[Vrev] 


Vt: v+t=8 <= t<n T=(0,S) VES 
Vi<n: Tb P>A+t,a:(v++t[\=> 0], S), b: (v, S) A not treading 
? 


Ir | a(b).P&A,a: (v, ?T(6,A).S) 
[Drev] 
Tb b:T vð Tr e PeA,a: (v[A- 0], S) 
d 
T E abe Aas, TONS) [Veena] 
T=(5,9') VUES v= r e PeA,a:(v[A 0), S) [Dsend] 
T+ ab.PeA,a: (v, !T(6,,).S),b: (7, S’) 
Yted:F H delay(t)PoA Ir } PeA+t Anot treading [De15/De1 d] 


I} delay(ô). P > A I} delay(t). P > A 


(vi, 81,M1)L(v2, S2, M2) r+ P >A, a: (r, Si), b: (v2, S2), ba : Mı, ab : M2 
r } (vab)P >A 


[Res] 
Aco VW: r kK v: T; T H Peay T H Q > As [Var/Par] 
T,X:T;O | XWw;byoA C+ P| QeAi, Az 
Vu,S)eO@: T,a:T,X:T;O | Peb:(v,S) T,X:T;O + Q>A [Rec] 
I+} def X(a;b)=PinQcea 
r e Pen A<: A r e Pea 
k 
T Pe T E Peia: (v, end) [ube eae] 


Fig. 5. Selected typing rules for processes 


reach fail-free states. Recall, P is fail-free when none of its sub-terms is the 
process failed. Time Safety builds on a condition that is not related with time, 
but with the structure of the process interactions. If an untimed process gets 
stuck due to mismatches in its communication structure, a timed process with 
the same communication structure may move to a failed state. Consider P below: 


P = (vab) (vcd) Q R=ab: Ø | ba: Ø | cd: Ø| de: Ø 


Q =a°(e).de.0 | c®(e).be.0| R (13) 


P is well-typed: Ø H Poa: (vo, 8), b : (v%,8),c: (vo, S), d : (vo, S) with S = 
?Int(x < 5, Ø).end. However, P can only make time steps, and when, overall, 
more than 5 time units elapse (e.g., 6 in the reduction below) P reaches a failed 
state due to a circular dependency between actions of sessions (vab) and (vcd): 


P —>  6(Q) = (vab)(vcd) (failed | failed | R) 
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Our typing system does not check against such circularities across different inter- 
leaved sessions. This is common in work on untimed [21] and timed [12] session 
types. However, in the untimed scenario, progress for interleaved sessions can be 
guaranteed by means of additional checks on processes [17]. Time Safety builds 
on the results in [17] by using an assumption (receive liveness) on the under- 
neath structure of the timed processes. This assumptions is formally captured 
in Definition 11, which is based on an untimed variant of our calculus. 


The untimed calculus. We define untimed processes, denoted by P, as processes 
obtained from the grammar given for timed processes (Sect.5) without delays 
and failed processes. In untimed processes, time annotations of branching /receive 
processes are immaterial, hence omitted in the rest of the paper. 

Given a (timed) process P, one can obtain its untimed counter-part by eras- 
ing delays and failed processes; we denoted the result of such erasure on P by 
erase(P). The semantics of untimed processes is defined as the one for timed 
processes (Sect.5) except that reduction rules [Delay], [TStr], and [Red2], are 
removed. Abusing the notation, we write Ê — P’ when an untimed process Ê 
moves to a state P' using the semantics for untimed processes. The definitions of 
Wait(Ê) and NEQueue(Ê) can be derived from the definitions for timed processes 
in the straightforward way. 

Definition 11 (receive liveness) formalises our assumption on the interaction 
structures of a process. 


Definition 11 (Receive liveness). Ê is said to satisfy receive liveness (or is 
live, for short) if, for all P' such that Ê —* P': 


P= (vab)Q Nae Wwait(Q) = Jĝ : Ô —* Ô nace NEQueue((’) 


In any reachable state P’ of a live untimed process P, if any endpoint a in Ê’ is 
waiting to receive a message (a € Wait(Q)), then the overall process is able to 
reach a state Ô’ where a can perform the receive action (a € NEQueue(Q’)). 

Consider process P in (13). The untimed process erase(P) is not live 
because Wait(erase(P)) = {a,c} and a,c ¢ NEQueue(erase(P)), since 
NEQueue(erase(P)) is the empty set. Syntactically, erase(P) is as P, but it 
does not have the same behaviour. P can only make time steps, reaching a failed 
process, while erase(P) is stuck, as untimed processes only make communication 
steps. 


Properties. Time safety relies on Subject Reduction Theorem 4, which estab- 
lishes a relation (preserved by reduction) of well-typed processes and their types. 


Theorem 4 (Subject reduction for closed systems). Let erase(P) be 
live. If Ø | P&>@andP — P thn @+ Peg. 


Note that Subject Reduction assumes erase(P) to be live. For instance, the 
example of P in (13) is well-typed, but erase(P) is not live. The process can 
reduce to a failed state (as illustrated earlier in this section) that cannot be 
typed (failed processes are not well-typed). Time Safety establishes that well- 
typed processes only reduce to fail-free states. 
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Theorem 5 (Time safety). [ferase(P) is live, +} Peo@andP —* P, 
then P' is fail-free. 


Typing is decidable if one uses processes annotated with the following informa- 
tion: (1) scope restrictions (vab : S)P are annotated with the type S of the 
session for endpoint a (the type of b is implicitly assumed to be S and both 
endpoints are type checked in the initial clock valuation vo); (2) receive actions 
a”(b : T). P are annotated with the type T of the received message; (3) recur- 
sion X(a: T ; a: S,6) = P are annotated with types for each parameter, and 
a guard modelling the state of the clocks. We call annotated programs those 
annotated processes derived without using productions marked as run-time (i.e., 
failed and delay(t). P), and where n in a”(b : T). P ranges over Q>o U {oo}. 


Proposition 2. Type checking for annotated programs is decidable. 


8 Conclusion and Related Work 


We introduced duality and subtyping relations for asynchronous timed session 
types. Unlike for untimed and timed synchronous [6] dualities, the composition 
of dual types does not enjoy progress in general. Compositions of asynchronous 
timed dual types enjoy progress when using an urgent receive semantics. We 
propose a behavioural typing system for a timed calculus that features non- 
blocking and blocking receive primitives (with and without timeout), and time 
consuming primitives of arbitrary but constrained delays. The main properties 
of the typing system are Subject Reduction and Time Safety; both results rely 
on an assumption (receive liveness) of the underneath interaction structure of 
processes. In related work on timed session types [12], receive liveness is not 
required for Subject Reduction; this is because the processes in [12] block (rather 
than reaching a failed state) whenever they cannot progress correctly, hence 
e.g., missed deadline are regarded as progress violations. By explicitly capturing 
failures, our calculus paves the way for future work on combining static checking 
with run-time instrumentation to prevent or handle failures. 

Asynchronous timed session types have been introduced in [12], in a multi- 
party setting, together with a timed z-calculus, and a type system. The direct 
extension of session types with time introduces unfeasible executions (i.e., types 
may get stuck), as we have shown in Example 1. [12] features a notion of fea- 
sibility for choreographies, which ensures that types enjoy progress. We ensure 
progress of types by formation and duality. The semantics of types in [12] is 
different from ours in that receive actions are not urgent. The work in [12] gives 
one extra condition on types (wait-freedom), because feasible types may still 
yield undesirable executions in well-typed processes. Thanks to our duality, sub- 
typing, and calculus (in particular the blocking receive primitive with timeout) 
this condition is unnecessary in this work. As a result, our typing system allows 
for types that are not wait-free. By dropping wait-freedom, we can type a class 
of common real-world protocols in which processes may be ready to receive mes- 
sages even before the final deadline of the corresponding senders. Remarkably, 
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SMTP mentioned in the introduction is not watt-free. For some other aspects, 
our work is less general than the one in [12], as we consider binary sessions rather 
than multiparty sessions. A theory of timed multiparty asynchronous protocols 
that encompasses the protocols in [12] and those considered here is an interesting 
future direction. The work in [6] introduces a theory of synchronous timed ses- 
sion types, based on a decidable notion of compatibility, called compliance, that 
ensures progress of types, and is equivalent to synchronous timed duality and 
subtyping in a precise sense [6]. Our duality and subtyping are similar to those 
in [6], but apply to the asynchronous scenario. The work in [15] introduces a 
typed calculus based on temporal session types. The temporal modalities in [15] 
can be used as a discrete model of time. Timed session types, thanks to clocks 
and resets, are able to model complex timed dependencies that temporal session 
types do not seem able to capture. Other work studies models for asynchronous 
timed interactions, e.g., Communicating Timed Automata [23] (CTA), timed 
Message Sequence Charts [2], but not their relationships with processes. The 
work in [5] introduces a refinement for CTA, and presents a notion of urgency 
similar to the one used in this paper, preliminary studied also in [29]. 

Several timed calculi have been introduced outside the context of behavioural 
types. The work in [32] extends the z- calculus with time primitives inspired in 
CTA and is closer, in principle, to our types than our processes. Another timed 
extension of the 7-calculus with time-consuming actions has been applied to the 
analysis the active times of processes [18]. Some works focus on specific aspects 
of timed behaviour, such as timeouts [9], transactions [24,27], and services [25]. 
Our calculus does not feature exception handlers, nor timed transactions. Our 
focus in on detecting time violations via static typing, so that a process only 
moves to fail-free states. 

The calculi in [7,12,15] have been used in combination with session types. 
The calculus in [12] features a non-blocking receive primitive similar to our 
a°(b). P, but that never fails (i.e., time is not allowed to flow if a process tries 
to read from an empty buffer—possibly leading to a stuck process rather than 
a failed state). The calculus in [7] features a blocking receive primitive without 
timeout, equivalent to our a% (b). P. The calculus in [15], seems able to encode 
a non-blocking receive primitive like the one of [12] and a blocking receive prim- 
itive without timeout like our a% (b). P. None of these works features blocking 
receive primitives with timeouts. Furthermore, existing works feature [7,12] or 
can encode [15] only precise delays, equivalent to delay(x = n). P. Such punc- 
tual predictions are often difficult to achieve. Arbitrary but constrained delays 
are closer abstractions of time-consuming programming primitives (and possibly, 
of predictions one can derive by cost analysis, e.g., [20]). 

As to applications, timed session types have been used for run-time mon- 
itoring [7,30] and static checking [12]. A promising future direction is that of 
integrating static typing with run-time verification and enforcement, towards a 
theory of hybrid timed session types. In this context, extending our calculus with 
exception handlers [9,24,27] could allow an extension of the typing system, that 
introduces run-time instrumentation to handle unexpected time failures. 
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Abstract. Shared session types generalize the Curry-Howard correspon- 
dence between intuitionistic linear logic and the session-typed 7-calculus 
with adjoint modalities that mediate between linear and shared session 
types, giving rise to a programming model where shared channels must 
be used according to a locking discipline of acquire-release. While this 
generalization greatly increases the range of programs that can be writ- 
ten, the gain in expressiveness comes at the cost of deadlock-freedom, a 
property which holds for many linear session type systems. In this paper, 
we develop a type system for logically-shared sessions in which types cap- 
ture not only the interactive behavior of processes but also constrain the 
order of resources (i.e., shared processes) they may acquire. This type- 
level information is then used to rule out cyclic dependencies among 
acquires and synchronization points, resulting in a system that ensures 
deadlock-free communication for well-typed processes in the presence of 
shared sessions, higher-order channel passing, and recursive processes. 
We illustrate our approach on a series of examples, showing that it rules 
out deadlocks in circular networks of both shared and linear recursive 
processes, while still being permissive enough to type concurrent imple- 
mentations of shared imperative data structures as processes. 


Keywords: Linear and shared session types - Deadlock-freedom 


1 Introduction 


Session types [25-27] naturally describe the interaction protocols that arise 
amongst concurrent processes that communicate via message-passing. This typ- 
ing discipline has been integrated (with varying static safety guarantees) into 
several mainstream language such as Java [28,29], F# [43], Scala [49,50], 
Go [11] and Rust [33]. Session types moreover enjoy a logical correspon- 
dence between linear logic and the session-typed m-calculus [8,9,51,55]. Lan- 
guages building on this correspondence [24,52,55] not only guarantee session 
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fidelity (i.e., type preservation) but also deadlock-freedom (i.e., global progress). 
The latter is guaranteed even in the presence of interleaved sessions, which 
are often excluded from the deadlock-free fragments of traditional session-typed 
frameworks [20, 26, 27,53]. These logical session types, however, exclude program- 
ming scenarios that demand sharing of mutable resources (e.g., shared databases 
or shared output devices) instead of functional resource replication. 

To increase their practicality, logical session types have been extended with 
manifest sharing [2]. In the resulting language, linear and shared sessions coex- 
ist, but the type system enforces that clients of shared sessions run in mutual 
exclusion of each other. This separation is achieved by enforcing an acquire- 
release policy, where a client of a shared session must first acquire the session 
before it can participate in it along a private linear channel. Conversely, when a 
client releases a session, it gives up its linear channel and only retains a shared 
reference to the session. Thus, sessions in the presence of manifest sharing can 
change, or shift, between shared and linear execution modes. At the type-level, 
the acquire-release policy manifests in a stratification of session types into linear 
and shared with adjoint modalities [5,47,48], connecting the two strata. Opera- 
tionally, the modality shifting up from the linear to the shared layer translates 
into an acquire and the one shifting down from shared to linear into a release. 

Manifest sharing greatly increases the range of programs that can be written 
because it recovers the expressiveness of the untyped asynchronous z-calculus [3] 
while maintaining session fidelity. As in the z-calculus, however, the gain in 
expressiveness comes at the cost of deadlock-freedom. An illustrative example is 
an implementation of the classical dining philosophers problem, shown in Fig. 1, 
using the language SILLs [2] that supports manifest sharing (in this setting we 
often equate a process with the session it offers along a distinguished channel). 
The code shows the process fork_proc, implementing a session of type sfork, and 
the processes thinking and eating, implementing sessions of type philosopher. We 
defer the details of the typing and the definition of the session types sfork and 
philosopher to Sect. 2 and focus on the programmatic working of the processes for 
now. For ease of reading, we typeset shared session types and variables denoting 
shared channel references in red. 

A fork_proc process represents a fork that can be perpetually acquired and 
released. The actions accept and detach are the duals of acquire and release, 
respectively, allowing a process to accept an acquire by a client and to initi- 
ate a release by a client, respectively. Process thinking has two shared channel 
references as arguments, for the forks to the left and right of the philosopher, 
which the process tries to acquire. If the acquire succeeds, the process recurs 
as an eating philosopher with two (now) linear channel references of type Ifork. 
Once a philosopher is done eating, it releases both forks and recurs as a thinking 
philosopher. Let’s set a table for three philosopher that share three forks, all 
spawned as processes executing in parallel: 


fo — fork_proc ; fı — fork_proc ; fo — fork_proc ; 
po — thinking — fo, fı ; pı — thinking — fı, f2 ; pe — thinking — fo, fo ; 
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fork_proc : {sfork} thinking : {phil < sfork, sfork} eating : {phil + lfork, lfork} 


c + fork_proc = c 4+ thinking < left, right = c + eating + left’, right’ = 
cœ + accept c ; left’ 4— acquire left ; right < release right’ ; 
c 4+ detach c’ ; right’ + acquire right ; left < release left’ ; 
c 4+ fork_proc c + eating + left’, right’ ; c + thinking + left, right 


Fig. 1. Dining philosophers in SILLs [2]. 


Infamously, this configuration may deadlock because of the circular dependency 
between the acquires. We can break this cycle by changing the last line to pg — 
thinking — fo, f2, ensuring that forks are acquired in increasing order. 

Perhaps surprisingly, cyclic dependencies between acquire requests are not 
the only source of deadlocks. Fig. 2 gives an example, defining the processes 
owner and contester, which both have a shared channel reference to a common 
resource that can be perpetually acquired and released. Both processes acquire 
the shared resource, but additionally exchange the message ping. More pre- 
cisely, process owner spawns the process contester, acquires the shared resource, 
and only releases the resource after having received the message ping from the 
contester. Process contester, on the other hand, first attempts to acquire the 
resource and then sends the message ping to the owner. The program deadlocks 
if process owner acquires the resource first. In that case, process owner waits 
for process contester to send the message ping while process contester waits to 
acquire the resource held by process owner. We note that this deadlock arises 
in both synchronous and asynchronous semantics. 


owner : {1 < sres} contester : {@{ping : 1} + sres} 
o < owner + sr = c + contester <— sr = 

c < contester < sr; lr < acquire sr ; 

lr < acquire sr ; c.ping ; 

case cof sr © release Ir ; 

| ping — wait c ; close c 


sr < release Ir ; close o 


Fig. 2. Circular dependencies among acquire and synchronization actions. 


In this paper, we develop a type system for manifest sharing that rules out 
cycles between acquire requests and interdependencies between acquire requests 
and synchronization actions, detecting the two kinds of deadlocks explained 
above. In our type system, session types not only prescribe when resources must 
be acquired and released, but also the range of resources that may be acquired. To 
this end, we equip the type system with the notion of a world, an abstract value 
at which a process resides, and type processes relative to an acyclic ordering on 
worlds, akin to the partial-order based approaches of [34,37]. The contributions 
of this paper are: 
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— acharacterization of the possible forms of deadlocks that can arise in shared 
session types; 

— the introduction of manifest deadlock-freedom, where resource dependencies 
are manifest in the type structure via world modalities; 

— its elaboration in the programming language SILLs+, resulting in a type 
system, a synchronous operational semantics, and proofs of session fidelity 
(preservation) and a strong form of progress that excludes all deadlocks; 

— the novel abstraction of green and red arrows to reason about the interde- 
pendencies between processes; 

— an illustration of the concepts on various examples, including an extensive 
comparison with related work. 


This paper is structured as follows: Sect.2 provides a short introduction 
to manifest sharing. Sect.3 develops the type system and dynamics of the lan- 
guage SILLs+. Sect. 4 illustrates the introduced concepts on an extended example. 
Sect. 5 discusses the meta-theoretical properties of SILLs+, emphasizing progress. 
Sect. 6 compares with examples of related work and identifies future work. Sect. 7 
discusses related work, and Sect.8 concludes this paper. 


2 Manifest Sharing 


In the previous section, we have already explored the programmatic workings of 
manifest sharing [2], which enforces an acquire-release policy on shared channel 
references. In this section, we clarify the typing of shared processes. 

A key contribution of manifest sharing is not only to support acquire-release 
as a programming primitive but also to make it manifest in the type system. 
Generalizing the idea of type stratification [5,47,48], session types are partitioned 
into a linear and shared layer with two adjoint modalities connecting the layers: 


As 
Al, Bı 


TPAL 
AL ® B. | {l * A, } | &{l z A} | AL —o B. | daz:As.B, | IT x:As.B, | 1 | PAs 


I> [I> 


In the linear layer, we get the standard connectives of intuitionistic linear logic 
(4AL8 B., A. — B, {l : AL}, &{1: A}, and 1). These connectives are extended 
with the modal operator |? As, shifting down from the shared to the linear layer. 
Similarly, in the shared layer, we have the operator {/A,, shifting up from the 
linear to the shared layer. The former translates into a release (and, dually, 
detach), the latter into an acquire (and, dually, accept). As a result, we obtain 
a system in which session types prescribe all forms of communication, including 
the acquisition and release of shared processes. 

Table 1 provides an overview of SILLs’s session types and their operational 
reading. Since SILLs is based on an intuitionistic interpretation of linear logic 
session types [8], types are expressed from the point of view of the providing pro- 
cess with the channel along which the process provides the session behavior being 
characterized by its session type. This choice avoids the explicit duality opera- 
tion present in original presentations of session types [25,26] and in those based 


Manifest Deadlock-Freedom for Shared Session Types 615 


Table 1. Session types in SILLs and their operational meaning. 


Session type Process term 
current cont current cont Description 
CL: {l : A} oit Aip Clr P P sends label J, along cı 
case a of l>Q Qn receives label l, along cı 
CL: &{l: Av} c : A, case cı of l P Pr receives label J, along c 
Clr; Q Q sends label l, along cı 
a: A@®B a:B senda d ;P P sends channel dı : A, along cı 
y = recy c ; Qu [d./y.] Qu, receives channel d, : A, along cı 
a: A — B a: Boy rec e ; Py [di /y] Py, receives channel d, : A, along c 
send & d; Q Q sends channel dı : A, along cı 
c.: Hr:As.Bı a: BL send c ds ; P P sends channel ds : As along c 
Ys + recy CL ; Qus [ds/ys] Qus receives channel ds : As along cı 
c : da:As.BL cr: Bu ys + recv c ; Pys [ds/ys] Py, receives channel ds : As along cı 
send c ds ;Q Q sends channel ds : As along c 
cil - close cL = sends “end” along cı 
wait & ; Q Q receives “end” along cı 
a: PAs cs: As cs + detach c, ; Prs [cs/as] Px, sends “detach cs” along c, 
£s + release cı ; Qas [cs/xs] Qe, receives “detach cs” along c 
cs : FAL a: ÅA a + acquire cs ; Qa, [cL/x1] Qu sends “acquire c” along cs 


£, + accept cs ; Pa [c./x.] Px, receives “acquire c,” along cs 


on classical linear logic [55]. Table 1 lists the points of view of the provider and 
client of a given connective in the first and second lines, respectively. Moreover, 
Table 1 gives for each connective its session type before and after the message 
exchange, along with their respective process terms. We can see that the process 
terms of a provider and a client for a given connective come in matching pairs, 
indicating that the participants’ views of the session change consistently. We 
use the subscripts L and S to distinguish between linear and shared channels, 
respectively. 

We are now able to give the session types of the processes fork_proc, thinking, 
and eating defined in the previous section: 


Ifork = |? sfork 
sfork = {? Ifork 
phil = 1 


The mutually recursive session types Ifork and sfork represent a fork that can per- 
petually be acquired and released. We adopt an equi-recursive [14] interpretation 
for recursive session types, silently equating a recursive type with its unfolding 
and requiring types to be contractive [19]. 

We briefly discuss the typing and the dynamics of acquire-release. The typing 
and the dynamics of the residual linear connectives are standard, and we detail 
them in the context of SILLs+ (see Sect.3). As is usual for an intuitionistic 
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interpretation, each connective gives rise to a left and a right rule, denoting the 
use and provision, respectively, of a session of the given type: 


(T-tiR) (T-L) 
D;-F Pos: (æ: A) I, zs: TAL; A, zL : AL F Qa = (z : CL) 
I F a <— accept zs; Py : (as : TPAL) T, zs: TPAL; Ak a, — acquire zs; Qr, :: (2. : CL) 
(T-Lir) (T-Jit) 
DF Prs :: (£s : As) T, xs : As; AF Qas 3: (zı : CL) 


T; H as | detach 21; Prs 2: (£ : |2.As) T;A,ax: [PAs F as e release z1; Qas :: (z1 : CL) 


The typing judgments I F P :: (a: As) and T; AF P :: (a, : A) indicate that 

process P provides a session of type A along channel z, given the typing of the 

channels specified in typing contexts I (and A). I and A consist of hypotheses 

on the typing of shared and linear channels, respectively, where I" is a structural 

and A a linear context. To allow for recursive process definitions, the typing 

judgment depends on a signature X that is populated with all process defini- 

tions prior to type-checking. The adjoint formulation precludes shared processes 

from depending on linear channel references [2,47], a restriction motivated from 

logic referred to as the independence principle [47]. Thus, when a shared session 

accepts an acquire and shifts to linear, it starts with an empty linear context. 
Operationally, the dynamics of SILLs is captured by multiset rewriting 

rules [12], which denote computation in terms of state transitions between con- 

figurations of processes. Multiset rewriting rules are local in that they only men- 

tion the parts of a configuration they rewrite. For acquire-release we have the 

following: 

(D-1?) 

proc(ds, a — accept as; Pz, ), proc(ca., a <— acquire as ; Qr, ) 

— proc(a, [a/n] Px), proc(c., [a/&] Qr ), unavail(as) 

(D-8) 

proc(a., xs < detach a ; Prs), proc(c., £s < release a, ; Qrs ), unavail(as) 

— proc(as, [as/25] Pss), proc(c., [as/25] Qrs ) 


Configuration states are defined by the predicates proc(cm, P) and unavail(as). 
The former denotes a running process with process term P providing along 
channel cm, the latter acts as a placeholder for a shared process providing along 
channel as that is currently not available. The above rule exploits the invariant 
that a process’ providing channel a can appear at one of two modes, a linear 
one, a, and a shared one, as. While the process (i.e. the session) is linear, it 
provides along a,, while it is shared, along as. When a process shifts between 
modes, it switches between the two modes of its offering channel. The channel at 
the appropriate mode is substituted for the variables occurring in process terms. 


3 Manifest Deadlock-Freedom 


In this section, we introduce our language SILLs+, a session-typed language 
that supports sharing without deadlock. We focus on SILLs+’s type system and 
dynamics in this section and discuss its meta-theoretical properties in Sect. 5. 
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3.1 Competition and Collaboration 


The introduction of acquire-release, to ensure that the multiple clients of a shared 
process interact with the process in mutual exclusion from each other, gives rise 
to an obvious source of deadlocks, as acquire-release effectively amounts to a 
locking discipline. The typical approach to prevent deadlocks in that case is to 
impose a partial order on the resources and to “lock-up”, i.e., to lock the resources 
in ascending order. We adopted this strategy in Sect. 1 (Fig. 1) to break the cyclic 
dependencies among the acquires in the dining philosophers. 

In Sect. 1, however, we also considered another example (Fig. 2) and discov- 
ered that cyclic acquisitions are not the only source of deadlocks, but deadlocks 
can also arise from interdependent acquisitions and synchronizations. In that 
example, we can prevent the deadlock by moving the acquire past the synchro- 
nization, in either of the two processes. Whereas in a purely linear session-typed 
system the sequencing of actions within a process do not affect other processes, 
the relative placement of acquire requests and synchronizations become relevant 
in a shared session-typed system. 

Based on this observation, we can divide the processes in a shared-session 
discipline into competitors and collaborators. The former compete for a set of 
resources, whereas the latter do not overlap in the set of resources they acquire. 
For example, in the dining philosophers (Fig. 1), the philosophers pg, pı, and po 
compete with each other for the set of forks fo, f1, and f2, whereas the process 
that spawns the philosophers and the forks collaborates with either of them. 

Transferring this idea to the process graph that emerges at run-time, we note 
that competitors are siblings whereas collaborators stand in a parent-descendant 
relationship. We illustrate this outcome on Fig.3 that shows a possible run- 
time process graph for the dining philosophers. Linear processes are depicted as 
solid black circles with a white identifier and shared processes are depicted as 
dotted filled violet circles with a black identifier. Linear channels are depicted as 
black lines, shared channel references as dotted violet lines with the arrow head 
pointing to the shared process being acquired!. The identifiers P), P,, and P> 
stand for the three philosophers, Fo, Fi, and Fy for the three forks, and T for 
the process that sets the table. The current run-time graph depicts the scenario 
in which P; is eating, while the other two philosophers are still thinking. 

Embedded in the graph is a tree that arises from the linear processes and the 
linear channels connecting them. For any two nodes in this tree, the parent node 
denotes the client process and the child node the providing process. We note 
that the independence principle (see Sect.2), which precludes shared processes 
from depending on linear channel references, guarantees that there exists exactly 
one tree in the process graph, with the linear main process as its root. The shape 
of the tree changes when new processes are spawned, linear channels exchanged 
(through @ and —), or shared processes acquired. For example, process P> could 
acquire the shared fork Fo, which then becomes a linear child process of P2, 
should the acquire succeed. As indicated by the shared channel references, the 


1 We have made sure to make the different concepts distinguishable in greyscale mode. 
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Legend: 


© linear process (child: provider, parent: client) 


@& shared process 


— linear channel 


==» shared channel reference 


Fig. 3. Run-time process graph for dining philosophers (see Fig. 1). 


sibling nodes Py, P}, and P> compete with each other for the nodes Fo, F3, and 
Fə, whereas the node T does not compete for any of the resources acquired by 
its descendants (including F and Fz). Our type system enforces this paradigm, 
as we discuss in the next section. 


3.2 Type System 


Invariants. Having identified the notions of collaborators and competitors, our 
type system must guarantee: (i) that collaborators acquire mutually disjoint sets 
of resources; (ii) that competitors employ a locking-up strategy for the resources 
they share; and, (iii) that competitors have released all acquired resources when 
synchronizing with other competitors. Invariant (ii) rules out cyclic acquisitions 
and invariants (i) and (iii) combined rule out interdependent acquisitions and 
synchronizations. 

To express the high-level invariants above in our type system, we introduce 
the notion of a world — an abstract value that is equipped with a partial order — 
and associate such a world with every process. Programmers can create worlds, 
indicate the world at which a process resides at spawn time, and define an order 
on worlds. Moreover, we associate with each process a range of worlds that 
indicates the worlds of resources that the process may acquire. As a result, we 
obtain the following typing judgments: 


Y; TH P:: (as: As[we[er]) (where Y+ irreflexive) 
Y; T; 6, Ab P: (a: Alwe[er]) (where Y+ irreflexive) 


The typing judgments reveal that we impose worlds at the judgmental level, 
resulting in a hybrid system, in which the adjoint modalities for acquire-release 
are complemented with world modalities that occur as syntactic objects in propo- 
sitions [7]. We use the notation £m : Am|wx [¢"] (where m stands for S or L) 
to associate worlds wx, w;, and wn with a process that offers a session of type 
Am along channel x. World wp denotes the world at which the process resides. 
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We refer to this world as the self world. Worlds w, and wpn indicate the range of 
worlds of resources that the process may acquire, with w; denoting the minimal 
(min) world in this range and wp the maximal (maz) one. 

Process terms are typed relative to the order specified in ¥ and the contexts 
I’, ®, and A. As in Sect. 2, I’ is a structural context consisting of hypotheses 
on the typing of variables bound to shared channel references, augmented with 
world annotations. We find it necessary to split the linear context “A” from 
Sect. 2 into the two disjoint contexts ® and A, allowing us to separate channels 
that are possibly aliased (due to sharing) from those that are not, respectively. 
Both @ and A consist of hypotheses on the typing of variables that are bound 
to linear channels, augmented with world annotations. W is presupposed to be 
acyclic and defined as: VW = - | W, wp < wi | W’, wo, where w stands for a 
concrete world w or a world variable 6. We allow W to contain single worlds, 
to support singletons as well as to accommodate world creation prior to order 
declaration. We define the transitive closure Y+, yielding a strict partial order, 
and the reflexive transitive closure W*, yielding a partial order. 

The high-level invariants (i), (ii), and (iii) identified earlier naturally tran- 
scribe into the following invariants, which we impose on the typing judgments 
above. We use the notation (£m); P to denote a process term that currently 
executes an action along channel £m. 


1. min(parent) < self(acquired_child) < max(parent): 
Yy : B [wo [er] COW + wy < Wo <n 

2. max(parent) < min(child): 
Vy: Bilwo all E AUB: PF F un < Wy 

3. If Y; T, xs : Alu, [Se]; P; A F 2, | acquire zs; Qa, :: (2. : Cilwe 18r]), then 
Vy: Blwoler] E D: VE F wo < wt. 

4. If; T; 8; AF (£m); P : (x : Awr fer]), then 8 = (-). 


Invariants 1 and 2 ensure that, for any node in the tree, the acquired resources 
reside at smaller worlds than those acquired by any descendant. As a result, the 
two invariants guarantee high-level invariant (i). Invariant 3, on the other hand, 
imposes a lock-up strategy on acquires and thus guarantees high-level invariant 
(i). To guarantee high-level invariant (iii), we impose Invariant 4, which forces a 
process to release any acquired resources before communicating along its offering 
channel. Since sibling nodes cannot be directly connected by a linear channel, 
the only way for them to synchronize is through a common parent. Finally, to 
guarantee that world annotations are internally consistent, we require for each 
annotation [wp [ër] that we < wi < wn. 


Rules. We now present select process typing rules, a complete listing is provided 
in the companion technical report [4]. The only new rules with respect to the 
language SILLs [2] are those pertaining to world creation and order determina- 
tion. These are extra-logical judgmental rules. We allow both linear and shared 
processes to create and relate worlds. Rules (T-NEW_) and (T-NEWs) create a 
new world w and make it available to the continuation Qw. Rules (T-ORD,) and 
(T-ORDs) relate two existing worlds, while preserving acyclicity of the order. 
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Y, w; D; B; Ab Qw: (av: Atlwm T22]) 


T-NEW 
Y; T; &; AF w e new_world; Qw :: (ti : A [wm [22]) ( u) 
Wow; Cb Qu: (£s : Aslwm TS? 
Qu = (2s i Asom ED on rp) 
W; T Hwe newworld; Qw :: (£s : As[wm 12? ]) 
Wp, Wr EW (W, wp < wr)? irreflexive 
V, wp < wr; T; 8; AF Q :: (av: A[wm {82 ]) 
7 (T-ORDt) 
W; 0; D; AF wp < wr; Qs: (aL: At[wm TE? ]) 
Wp, Wr EW (VY, wp < wr)” irreflexive 
Wi wp < wr; LEQ: (as: Aslwm[S? 
: (5: Aslam ED en Onno) 


Y; IE wp < wr; Qs: (as : Aslwm]E2]) 


We now consider the typing rule for acquire, which must explicitly enforce the 
various low-level invariants above. Since an acquire results in the addition of a 
new child node to the executing process, the rule can interfere with Invariants 1 
and 2. The first two premises of the rule ensure that the two invariants are 
preserved. Moreover, the rule has to ensure that the acquiring process is locking- 
up (Invariant 3), which is achieved by the third premise. 


WF wk < Wm < Wn Wt F wn < Wu Vyr : Bilwi tor] E B : wi < wm 
W; T, xs : ÑA wm [82]; D, xı : Awm for]; AF Qa = (2 : CLfw; 1E] 


Wk 


W; T, xs : Awmi]; B; Ab a — acquire zs ; Qa, :: (2 : CL[w; [27]) 


(T-fÈL) 


The remaining shift rules are actually unchanged with respect to SILLs, mod- 
ulo the world annotations. In particular, low-level Invariant 4 is already satisfied 
because the conclusion of rule (T-ffr) does not have a context ® and because 
the independence principle forces ® to be empty in rule (T-|?R). 

Y; D; +506 Pas (a: Awm Ton) 
W; I H x —acceptas; Py :: (£s : Awm [2 ]) 
Y; Dyas: Aslwm Jor]; D; AF Qus (a: Ci [ws log ) 
W; D; Ba: |PAslwm [82]; AF zs — release zı ; Qus 1 (a : Cilw; fer] 
v, TF Pa e (zs: As[wm [22 ]) 


v; T; -; -H as — detach x, ; Poss: (av: |PAs[wm T8]) 


(T-Tir) 


(T-lin) 


(T-Lir) 


We now consider the linear connectives, starting with 1. Rule (T-1,) reveals 
that only processes that have never been acquired may be terminated. This 
restriction is important to guarantee progress because existing clients of a shared 
process may wait indefinitely otherwise. We impose the restriction as a well- 
formedness condition on a session type, giving rise to a strictly equi-synchronizing 
session type. The notion of an equi-synchronizing session type [2] has been 
defined for SILLs and guarantees that a process that has been acquired at a 
type As is released back to the type As, should it ever be released. A strictly 
equi-synchronizing session type additionally requires that an acquired resource 
must be released. The corresponding rules can be found in [4]. Linearity enforces 
Invariant 4 in rule (T-1,), making sure that no linear channels are left behind. 


Manifest Deadlock-Freedom for Shared Session Types 621 
v; T; 6, AF Qs: (a : C[w; tér] 
V; T; D; A,x : Ufwm [Se] waiter; Q :: (z : CLlw, p 
(T-1r) 


(T-11) 


Y; T; -; -H close z :: (ai: 1fwm [8%]) 


Next, we consider internal and external choice. Since internal and external 
choice cannot alter the linear process tree of a process graph, the rules are very 
similar to the ones in SILLs. The only differences are that we get two left rules 
for each connective and that the -context of each right rule must be empty to 
satisfy Invariant 4. The former is merely due to the tracking of possibly aliased 
sessions in the ® context. We only list rules for internal choice, those for external 
choice are dual and can be found in [4]. 

(Vi) Y; T; $; A, xı : An [wm Tee] F Qs =: Cilo; Ter 
P; T; @; A,x : D{l : A lwm |e] F case zı of l> Qs: (a: N [w; tgz] 
(Vi) Y; T; D, x : AL [wm [22]; AF Qi i (av: Clw ter] (T ) 
T. A “WL 
Y; T; D, x: O{l: Ai} [wm [22]; AF case x of T> Q : (z : Ofw; [22 ]) i 
P; Ie ARP: (a. Anume) 
G: I; Abadn; Ps: (av: Awa [82 ]) 


(T- Li) 


More interesting are linear channel output and input, since these alter the 
linear process tree of a process graph. Moreover, additional world annotations 
are needed to indicate the worlds of the channel that is exchanged. For the 
latter we use the notation Quwi Ter , indicating that the exchanged channel has 
the worlds w, wp, and w, for self, min, and max, respectively. To account for 
induced changes in the process graph, the rules that type an input of a linear 
channel must guard against any disturbance of Invariants 1 and 2. Because the 
two invariants guarantee that parents do not overlap with their descendants in 
terms of acquired resources, they prevent any exchange of acquired channels. 
We thus restrict © and — to the exchange of channels that have not yet been 
acquired. This is not a limitation since, as we will see below, shared channel 
output and input are unrestricted. 

Even with the above restriction in place, we still have to make sure that a 
received channel satisfies Invariant 2. If we were to state a corresponding premise 
on the receiving rules, invertibility of the rules would be disturbed. To uphold 
invertibility, we impose a well-formedness condition on session types that ensures 
for a session of type A, Qu; le QB, [wm [22] that wy < wp and, analogously, for 
a session of type A Qw lar — Bijwm Tee] ] that wy < wp. Session types are 
checked to be well-formed upon process definition. Given type well-formedness, 
we obtain the following rules for —, noting that the right rule enforces Invariant 4 
by requiring an empty ®-context. The rules for ® are dual. 
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Y; T; $; A, z : BulwmI Syl Qs (a: CLlwy lor) 


(T-ri) 
Y; T; 8; A, z, : A Qw [ZF Bilwm toy l we : Alw tér] F sender yL; Q i (ar: Clog ER) 


CAE E E E Bi [umt]; Agaete: Cilwj tari) 


(T-—=L3) 
Y; T; 8, æ : A Qw fór — Bilwm 20); Awe: ALl tór] E send et yL; Q = (ar: Celoj tor) 


wu 


Y; D; c; Asm: Ailo Ter] Py = (æL : Bilom 13%) 


(T-—pg) 
Y; D; -; AF yL e rew gz, ; Py 2: (æ: AL Qui [Zr By [wm tS? ]) 


Since there are no invariants imposed on the shared context I’, the rules 
for shared channel output and input are identical to those in SILLs. The only 
differences are that we have two left rules and that the -context of the right rule 
must be empty to satisfy Invariant 4. The former is merely due to the tracking 
of possibly aliased sessions in the ® context. 

Y; Dyys: Aswi tor]; A, zı : Bulwm og] F Qus © (a : CLlwy Tor) 

W038; A, x: 5a: As@uy [Er . Bi [wm Ter] F ys = reev ti; Qus 2s (a: CL 

Y; T, ys : Asli fór]; P, a : Bulwm Toi); AF Qus : (a : Clw Tog 

Y: D; 2, : Ir: As Qw [Er . Bi [wm og]; AF ys — recv zı ; Qy © (2 : Ci [wy orl) 
Y; T, ys : Aslan opl; ; AF Ps (av: Bilwm for] 

Y; T, ys : Aswi fór]: ; AF send zı ys; P :: (x : Ix:AsQw Jor. Bilwm óg] 


(T-3R) 


We finally consider the rules for forwarding and spawning. We allow a shared 
forward between processes that offer the same session at the same worlds. 
Because forwards have to be world-invariant, however, no well-typed program 
could ever have a linear forward. The process being forwarded to must be in 
either of the contexts ® or A, and thus satisfies Invariant 2, making it impossible 
for the world annotations of the forwarder and forwardee to match. We omit 
linear forwarding and discuss possible future extensions in Sect. 6. 


Y; T, ys : As[w; or] F fwd as ys :: (xs : As[w; [37 ]) es) 
The rules for spawning depend on the possible modes of the spawning 
and spawned processes: (T-SPAWNLL) specifies how a linear process can spawn 
another linear process; (T-SPAWNss) specifies how a shared processes can spawn 
another shared process. The rules are checked relative to a process definition 
found in the signature X and to a world substitution mapping y : |¥| — |’, 
such that for each ô € W’ we have W | +(6), where |W| denotes the field of 
W (i.e., the union of its domain and range). As usual, we lift substitution to 
types 4(A,,), contexts 7(I"), and orders 4(W). Both rules ensure that, given the 
mapping y, the order ¥ of the spawning process entails the one of the process 
definition (2 H 4(W’)). The linear spawn rule (T-SPAWN L) further enforces 
Invariant 2 for the spawned child. We note that the spawned child enters the 
linear context A in the spawning process’ continuation since no aliases to such 
a process can exist at this point. 
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Ar =u: BLlwm Iot] =H: By [ömt] Ty = 25: Cs foz] 
1 1 1 ò 1 1 , 
(HER a ADs] = XL APT = Pay dom(A!) dom(b!) dom(r') wl) E F 
P 6 ; E P A r 
WALST D = Alles leR] (A) =A AD =S VU) =T ACB") 
yt + wt < Wk 


Y; Ti, I2; $2; Ao, æ, : Alwi fór] E Qa, # (2! : Dilwi fot] 


— 7 T (T-SPAwNLL) 
Y; D, T2; $1, 82; Ar, A2 F a : Alw TSP] — XL = WH, Ss Qay = l i Di wily) 


as 5 
Ty = zs : Cs[wr Tor] (P F xg: Aslôj 157] Xs- M= Pat dom(L!),w!!) EX 
` 5 ‘ > 
AASI Tar D = Aslwj tén] ACE) = Py wr 4(w"') 
Y; Di, Ta, a5 : Aslwj TE") + Qas = (zg : Ds[wild§)) 


Y; T1, Ta H ag: Aswj lor] — Xs — Z5; Qag = (2g : Ds[wi loi) 


(T-SPAWN¢gs ) 


In the companion technical report [4], we provide a variant of rule 
(T-SPAWN__L) for the case of a linear recursive tail call. Without linear forward- 
ing, a linear tail call can no longer be implicitly “de-sugared” into a spawn and 
a linear forward [2,22,52], but must be accounted for explicitly. In the report, 
we also provide the rules for checking process definitions. Those rules make sure 
that the process’ world order is acyclic, that the types of the providing session 
and argument sessions are well-formed, and that the process satisfies Invariants 1 
and 2. 


3.3 Dining Philosophers in SILL,+ 


Having introduced our type system, we revisit the dining philosophers from 
Sect.1 and show how to program the example in SILLs+, ensuring that the 
program will run without deadlocks. The code is given in Fig. 4. We note the 
world annotations in the signature of the process definitions. For instance, 


thinking : {59 < 61,61 < 52,52 < 53 + philldo [$2] — sfork[d1 $3], sfork[ð2 13°]; +; -} 


indicates that, given the order 69 < 6, < 62 < 63, process thinking provides 
a session of type phil[do t2] and uses two shared channel references of type 
sfork[d1 P| and sfork[d2 îs], The two - signify that neither acquired nor linear 
channel references are given as arguments. The signature indicates that the two 
shared fork references reside at different worlds, such that the world of the first 
one is smaller than the one of the second. 

Let’s briefly convince ourselves that the two acquires in process thinking in 
Fig. 4 are type-correct. For each acquire we have to show that: the world of the 
resource to be acquired is within the acquiring process’ range; the max of the 
acquiring process is smaller than the min of the acquired resource; and, that 
the self of the acquired resource is larger than those of all already acquired 
resources. We can convince ourselves that all those conditions are readily met. 
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thinking : {50 < ôi, ô < ô2, ô2 < 63 H lfork = 2 sfork 
phill5o 432] < sfork[d1 453], sfork[d2$53];-;-} sfork = 1$ Ifork 
[50452] + thinking < left[o1 ti, right [52153] 5 phil = 1 


left’ + acquire left ; 
right’ 4+ acquire right ; 
c + eating < left’, right’ ; 


eating : {ðo < 61,61 < 62,62 < 63 F fork_proc : {ðo < 61 F sfork[dot$]} 
phill5o4$2] < +; Ifork[d1 53], lfork[52 453]; -} [50422] + fork_proc = 
[50452] + eating + left’ [ô $al; right’ [52453] = œ + accept c ; 
right < release right’ ; c 4+ detach c’ ; 
left < release left’ ; a: sfork[dot§2] + fork_proc ; 
c + thinking < left, right fwd c c” 


Fig. 4. Deadlock-free version of dining philosophers in SILLs+. 


We note, however, that if we were to swap the two acquires, the program would 
not type-check. 

Let us once more set the table for three philosophers and three forks. We 
execute this code in a process with world annotations [ĝa T] such that dg < dp. 
We first create new worlds and define their order: 


w1 — new world; w2 — new_world; w3 — new_world; w4 — new_world; 
Ôa < W1; 0a < W2; Ôb < W1; W1 < W2; W1 < W3; W1 < W4; W2 < W3; W2 < W4; W3 < W4; 


We then spawn the forks, each residing at a different world, such that the max 
world of a fork is higher than the self of the highest fork, ensuring Invariant 2 
for the philosopher processes that we spawn afterwards: 


fi : sfork[w: [wt] — fork_proc ; fə : sfork[w2 W4] — fork_proc ; 


fa : sfork[ws [Wt] — fork_proc ; 


When we spawn the philosophers, we ensure that Po is going to pick up fork Fi 
and then F», Pı is going to pick up F> and then F3, and Pə is going to pick up 
Fı and then Fs, 


po : phillda [wz] — thinking — -;-; fi, fe ; pı : phillda [wg] — thinking — +; +; f2, fe ; 
p2 : phil[da [ws] — thinking — ~; 5; fi, fs ; 


We note that the deadlocking spawn 
pa : phillda [ws] — thinking — 5; fs, f1 ; 


is type-incorrect since we would substitute both w; and w for 6, and w3 and wy 
for 62, which violates the ordering constraints put in place by typing. 
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3.4 Dynamics 


We now give the dynamics of SILL>+. Our current system is based on a syn- 
chronous dynamics. While this choice is more conservative, it allows us to narrow 
the complexity of the problem at hand. 

As in SILLs, we use multiset rewriting rules [12] to capture the dynam- 
ics of SILLs+ (see Sect.2). Multiset rewriting rules represent computation 
in terms of local state transitions between configurations of processes, only 
mentioning the parts of a configuration they rewrite. We use the predicates 
Proc(G@m, Wa, i Bee Pa„) and unavail(as, Wa, ie) to define the states of a con- 
figuration (see Sect.5.1). The former denotes a process executing term P that 
provides along channel am at mode m with worlds wa, Wap, and wa, for self, 
min, and max, respectively. The latter acts as a placeholder for a shared process 
providing along channel as with worlds w,,, Wa,, and wa, for self, min, and max, 
respectively, that is currently unavailable. We note that since worlds are also 
run-time artifacts, they must occur as part of the state-defining predicates. 

Fig. 5 lists selected rules of the dynamics. Since the rules remain largely the 
same as those of SILLs, apart from the world annotations that are “threaded 
through” unchanged, we only discuss the rules that actually differ from the 
SILLs rules. The interested reader can find the remaining rules in the companion 
technical report [4]. 


(D-SPAWNLL) 

proc(a, Wa; wed, 2: AilWe, tw] © Xi + @,&, ds; Qn), 

Idef(W! F x! : Ai |ê; +] EX A’, I= Pat dom(A') dom!) ,dom( 1"), w” ) 

—> proc(b., Wor Swe?» [bL/ x, @/dom(A’), &/dom( 8’), ds/dom(I")]4( Pay dom( A") ,dom(®"),dom(r"), w )), 
proc(a., Wa twas, [b./2i] Qn), 
unavail(bs, Wo; wg ) (b fresh) 

(D-NEw) 

proc(a, Wai tne’; w + new_world; Qw) —> proc(a, wa, Tias Qw) (w fresh) 

(D-ORD) 

proc(a, Wa was, w<w; Q) — proc(a, Way ws, Q) 


Fig. 5. Selected multiset rewriting rules of SILLs+. 


Noteworthy are the rules D-NEW and D-ORD for creating and relating 
worlds, respectively. Rule D-NEW creates a fresh world, which will be glob- 
ally available in the configuration. Rule D-ORD, on the other hand, updates the 
configuration’s order with the pair w < w’. Rule D-SPAWn L_, lastly, substitutes 
actual worlds for world variables in the body of the spawned process, using the 
substitution mapping y defined earlier. It relies on the existence of a correspond- 
ing definition predicate for each process definition contained in the signature X. 
We note that the substitution y in rule D-SPAWN__ instantiates the appropriate 
world variables in the spawned process P. 
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4 Extended Example: An Imperative Shared Queue 


We now develop a typical imperative-style implementation of a queue that uses 
a list data structure internally to store the queue’s elements and has shared 
references to the front and the back of the list for concurrent dequeueing and 
enqueueing, respectively. The session types for the queue and the list are? 


queue A, = T?&{enq : Ix: Ag. |>queue As, 
deq : {none : |?queue Ac, some : Jz: As. |?queue As }} 


list As = F&{ins : Tax: Ag. Jy:list As. Plist As, 
del : B{none: [list As, some : Jx: As. |?list As} 


The list is implemented in terms of processes empty and elem, denoting the 
empty list and a cons cell, respectively. We show the more interesting case of a 
cons cell (Fig. 6). The queue is defined by processes head (Fig. 7) and queuwe_proc 
(Fig. 8), the latter being the queue’s interface to its clients. 


elem : {61 < 62,52 < 63,53 < 64 H list[51 452] As[53454] — As[ds 154], list[d11$2] As [ôs T$4]} 


c[61 $52] [53 tst] < elem  x[53t§4], neat [61432] [53t54] = 
œ + accept c ; 
case c’ of 
| ins > y & reeve’;n¢ elem + y, next; send œ n ; 
c + detach c’ ; 
c” : list[o1 52] As [ős sa] + elem + z, n; fwd c c” 
| del + c'.some ; send c x ; 
c 4+ detach c’ ; fwd c next 


Fig. 6. Imperative queue — elem process. 


We can now define a client (Fig.8) for the queue, assuming existence of a 
corresponding shared session type item and a process item_proc offering a session 
of type item|[53)3"]. The client instantiates the queue at world dy, allowing it 
to acquire resources at world wi, which is exactly the world at which process 
queue_proc instantiates the list. Given that the client itself resides at world ôa, 
which is smaller than the queue’s world 6, the client is allowed to acquire the 
queue, which in turn will acquire the list to satisfy any requests by the client. 

The example showcases a paradigmatic use of several collaborators, where 
collaborators can hold resources while they “talk down” in the tree. In particular, 
as illustrated in Fig.9, the clients C1, C2, and C3 compete for resources at 
world 6p, i.e., the queue Q. On the other hand, a client C; collaborates with the 
queue Q, the list elements L;, and the items J;, since they do not overlap in 


2 We adopt polymorphism for the example without formal treatment since it is orthog- 
onal and has been studied for session types in [23,46]. 
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head : {50 < 61,51 < 62,52 < 53,63 < 54 + queue[dot$"] As [63 t$4] list [51 52] As [53454], 
list{5. $$] As[d3 1341} 


{bot ][6s28] + head + front Q421 [6st], back( ds tös tii] = 
c + accept c ; 
case c’ of 
| enq > z + recvc’ ; 
back’ + acquire back ; 
back’ ins ; send back’ x ; e + recv back’ ; 
back + release back’ ; 
c 4 detach c’; c”: queuefðo $$] As [ős al + head + front, e ; fwd c c” 
| deq > front’ < acquire front ; 
front’ del ; 
(case front’ of 
| none + front + release front’ ; c'.none ; c + detach c’ ; 
e; queue[do4$"] As [53 | + head + front, back ; fwd c c” 
| some > x + recv front’ ; 
front < release front’ ; 
c!.some ; send c’ x ; c + detachc’ ; 
ee queue[dot$"] As [63 t$4] + head + front, back ; fwd c c”) 


Fig. 7. Imperative queue — head process. 


queue_proc : {ôo < 61,61 < 63,63 <64 client : {da < dp F 1[5at3>]} 


k queue[ðo $5: ] As [63 t$4]} [bat] & client — 
b 


c[do S1] [ős | < queue_proc = Wi + new_world ; w3 + new_world ; 
w2 + new_world ; w4 + new_world ; 
01 < w2 ; W2 < 03 ; Ôb < W1 ; W1 < W3 5 W3 < W3 ; 
e : list[d1 f2] As[63 s] + empty ; io : item[w3 twa] — item proc ; 
ee queue[dot$?] As [63194] + head q : queue(de tnt] As[ws fwi] <- queue_proc ; 
eee: q' < acquire q ; q'.enq ; send q/ io ; 
wde” q + release q’ ; close c 


Fig. 8. Imperative queue — queue_proc process and client process. 


the set of resources they may acquire: a client acquires resources at dy, a queue 
resources at w1, a list resources at w2, and an item resources at w4, and we have 
Ôa < Ôp < W1 < Wo < w3 < w4. We note in particular that the setup prevents a 
list element from acquiring its successor, forcing linear access through the queue. 


5 Semantics 


In this section, we discuss the meta-theoretical properties of SILLs+, focusing on 
deadlock-freedom. The companion technical report [4] provides further details. 
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bat] Sa <ó 
als a < 0b 

Ôb wel Ôb < Wy 


W. 
wif] wi < wo 


W. 
W3 w W2 < W3 < W4 


Fig. 9. Run-time process graph for imperative queue (see Fig. 3 for legend). 


5.1 Configuration Typing and Preservation 


Given the hierarchy between mode S and L and the fact that shared processes 
cannot depend on linear processes, we divide a configuration into a shared part 
A and a linear part ©. We use the typing judgment Y; I F A;O :: T;®, A to 
type configurations. The judgment expresses that a well-formed configuration 
4A; O provides the shared channels in I’ and the linear channels in ® and A. 
A configuration is type-checked relative to all shared channel references and a 
global order YW. While type-checking is compositional insofar as each process 
definition can be type-checked separately, solely relying on the process’ local 
W (and T), at run-time, the entire order that a configuration relies upon is 
considered. We give the configuration typing rules in Fig. 10. 

Our progress theorem crucially depends on the guarantee that the Invari- 
antsl1 and 2 from Sect.3 hold for every linear process in a configura- 
tion’s tree. This is expressed by the premises Invi (proc(a@, Wa, Tus Pa )) and 
Inva(proc(a, Wa, taz; Pa)) in rule (T-@2), based on the Definitions1 and 2 
below that restate Invariants 1 and 2 for an entire configuration. We note that 
Invariant 2 is based on the set of all transitive children (i.e., descendants) of a 
process. We formally define the notion of a descendant inductively over a well- 
typed linear configuration. The interested reader can find the definition in the 
companion technical report [4]. 


Invariant 1 (min(parent) < self(acquired_child) < max(parent)). fY; rF O:: 
®, A and for any proc( a, Wa, Pea Pa) € O such that Y; T; B1; Ay F Pa = (a: 


A, [wa Tae) Invi (proc(a,, Was Tacs Pa )) holds if an only if for every acquired 

resource b, : By [Wp, Toz] € ı it holds that W* F wa, < Wp, < Wa,- Moreover, 

if Pa, = @% + acquire cs; Qu, for a (cs : T?C.[We, tw) € I, then, for every 
Wb3 


acquired resource b, : B,|Wp, [wg] E€ B1, it holds that UW F wp, < We, and that 
W* F Wa, < We, < Was- 
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———_——. (T-01) 

WDE) 2 () 
(as : Blwa, cia eT H (A., Ê) sesync Y F A [wa Ta] type 

Ww F Way < Wa Invi(proc(a., Wa, ty Pa )) Inv2(proc(a., Wa; alts Pa )) 
v: r FO: 8,8, A, Ai Y; T; 1; A, F Pa: (av: A. [wa twe?]) 


Ww; EO, proc(a, Wai la Pa): (8, A, a : A. [wa; twe?]) 


(T-@2) 


E (MAL, tA.) sesync W H TP Aiwa fws] type 
Wb Wap SWa WD H Pag i (as : TAL [Way we?) 
Y; T E proc(as, Wa, wads Pas) i (as : TAL [Way tne3]) 

UWTDEA:T WTEA 2 TD 
Wea z wan (T-As) = Lo 2 
W; T E unavail(as, Wa; twas) :: (as : A[Wa; twas ]) WDE AA’ :: Ty, D2 


(T-A1) (T-42) 


(T-44) 


UWTFEA: T vrO: A 
Y: rEAO0O:: T;8,A 


(T-2) 


Fig. 10. Configuration typing 


Invariant 2 (max(parent) < minima(descendants)). If Y; FO: ,A 
and for any proc(a, Wa, a Pa) € © and that process’ descendants (Y; I F 
Wag 


@: 6,A) > a = (£,/’), Invo(proc(a, Wa, wa, Pa)) holds iff for every 
descendant b, : B,[Wp, fw] € (', A’) it holds that Vt F wa, < Wp,- 


Our preservation theorem states that Invariants 1 and 2 are preserved for 
every linear process in the configuration along transitions. Moreover, the theorem 
expresses that the types of the providing linear channels ® and A are maintained 
along transitions and that new shared channels and worlds may be allocated. 
The proof relies, in particular, on session types being strictly equi-synchronizing, 
on a process’ type well-formedness and assurance that the process’ min world is 
less than or equal to its max world. 


Theorem 5.1 (Preservation). If Y; r F A; O :: T; &, A and 4; O — A’; 0’, 
then W'; I'= A’; O' :: I’; 6, A, for some A’, O', W', and T”. 


5.2 Progress 


In our development so far we have distilled the two scenarios of interdepen- 
dencies between processes that can lead to deadlocks: cyclic acquisitions and 
interdependent acquisitions and synchronizations. This has lead to the develop- 
ment of a type system that ingrains the notions of competitors and collaborators, 
such that the former compete for a set of resources whereas the latter do not 
overlap in the set of resources they acquire. Our type system then ties these 
notions to a configuration’s linear process tree such that collaborators stand in a 
parent-descendant relationship to each other and competitors in a sibling/cousin 
relationship. In this section, we prove that this orchestration is sufficient to rule 
out any of the aforementioned interdependencies. 
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To this end we introduce the notions of red and green arrows that allow us 
to reason about process interdependencies in a configuration’s tree. A red arrow 
points from a linear proc(a,, Wa, Mae ) to a linear proc(b,, Wp, Tw P), if the 
former is attempting to acquire a resource held by the latter and, consequently, is 
waiting for the latter to release that resource. A green arrow points from a linear 
proc( a, Wei Iwas Q) to a linear proc(b., Wp, Tass P), if the former is waiting to 
synchronize with the latter. We define these arrows formally as follows: 


Definition 5.2 (Acquire Dependency — “Red Arrow”). Given a well- 
formed and well-typed configuration W; °F A; 0 :: T; &, A, there exists a waiting- 
due-to-acquire relation A(O) among linear processes in O at run-time such that 


w, . wi 
proc( a, War Twat, Tı <— acquire Cs; On) <a proc(b,, Wb; Tey P(q)) 
where P(c) denotes a process term with an occurrence of channel c. 


Definition 5.3 (Synchronization Dependency — “Green Arrow”). 
Given a well-formed and well-typed configuration Y; I E A;O :: ';®, A, there 
exists a waiting-due-to-synchronization relation S(O) among linear processes in 
O at run-time such that 


Wb3 


proc(a,, Wa, ce (bi); Q) <s proc(b., Wb, Twos s A716); P) 
proc(b,, Wp, rey -(b.); P) <s proc(a,, Wa, (Rae (7b); Q(b)) 


where P(a,) denotes a process term with an occurrence of channel b., (a); P a 
process term that currently executes an action along channel a, and (7a); P a 
process term whose currently executing action does not involve the channel a. 


It may be helpful to consult Fig.3 at this point and note the semantic dif- 
ference between the violet arrows in that figure and the red arrows discussed 
here. Whereas violet arrows point from the acquiring process to the resource 
being acquired, red arrows point from the acquiring process to the process that 
is holding the resource. Thus, violet arrows can go out of the tree, while red 
arrows stay within. Given the definitions of red and green arrows, we can define 
the relation W(@) on the configuration’s tree, which contains all process pairs 
that are in some way waiting for each other: 


Definition 5.4 (Waiting Dependency). Given a well-formed and well- 

typed configuration VW; FE A;O :: [';8,A, there exists a waiting relation 

W(@) among processes in O at run-time such that proc(a, Was wes P) <w 
Wb3 

proc(b,, Wb, Tw. Q), 


— if proc(a,, Wa, jar P) <4 proc(b., we, i Q), or 
. Wa w 
T if proc(a, Wa, Liat P) <s proc(b,, Wb; Tei Q). 


Having defined the relation W(@), we can now state the key lemma underly- 
ing our progress theorem, indicating that W(@) is acyclic in a well-formed and 
well-typed configuration. 
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Lemma 5.5 (Acyclicity of W(@)). If¥; T = A; O : T; 8, A, then W(O) is 
acyclic. 


We focus on explaining the main idea of the proof here. The proof proceeds 
by induction on ¥;I E © :: 6, A, assuming for the non-empty case Y; I F 
O, proc(a,, Wa, i Pa) =: (8, A, a, : A fwa (esa) that W(@) is acyclic, by the 
inductive hypothesis. We then know that there cannot exist any paths of green 
and red arrows in O that form a cycle, and we have to show that there is no 
way of introducing such a cyclic path by adding node proc(a,, Wa, te P,,) to 
the configuration ©. In particular, the proof considers all possible new arrows 
that may be introduced by adding the node and that are necessary for creating a 
cycle, showing that such arrows cannot come about in a well-typed configuration. 

We illustrate the reasoning for the two selected cases shown in Fig. 11. Case 
(a) represents a case in which process P}, is waiting to synchronize with its child 
Py, while holding a resource a descendant of P,, or Ph, itself wants to acquire. 
However, this scenario cannot come about in a well-typed configuration because 
Pa, and Py, are collaborators and thus cannot overlap in resources they acquire. 
Case (b) represents a case in which process P,, is waiting to synchronize with 
its child P,, while another child, process P,,, is waiting to synchronize with Pa,- 
Given acyclicity of W(@), a necessary condition for a cycle to form is that there 
already must exist a red arrow C in the configuration that connects the subtrees 
in which the siblings P, and P,, reside. However, this scenario cannot come 
about in a well-typed configuration because P,, and P., are competitors, forcing 
Pa or any of its descendant to release a resource before synchronizing with Pa,- 
These arguments are made precise in various lemmas in [4]. 


2., 
A â ~ S 
CITARE ee 


Fig. 11. Two prototypical cases in proof of acyclicty of W (0). 


Given acyclicity of W(@), we can state and prove the following strong 
progress theorem. The theorem relies on the notion of a poised process, a pro- 
cess currently executing an action along its offering channel, and distinguishes 
a configuration only consisting of the top-level, linear “main” process from one 
that consists of several linear processes. We use |O| to denote the cardinality 
of O: 
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Theorem 5.6 (Progress). If Y; r F 40: (T;c : 1[We, Twe), then either 


- A— A’, for some A’, or 
— A is poised and 
e if |O| =1, then either A4; O — A’; 6’, for some A’ and O', or O is poised, 
or 


e if |O| > 1, then A;O — A’; 0’, for some A’ and O'. 


The theorem indicates that, as long as there exist at least two linear processes 
in the configuration, the configuration can always step. If the configuration only 
consists of the main process, then this process will become poised (i.e., ready to 
close), once all sub-computations are finished. The proof of the theorem relies 
on the acyclicity of W(@) and the fact that all sessions must be strictly equi- 
synchronizing. 


6 Additional Discussion 


Linear Forwarding. Our current formalization does not include linear for- 
warding because a forward changes the process tree and thus endangers the 
invariants imposed on it. This means that certain programs from the purely lin- 
ear fragment may not type-check in our system. However, the correspondingly 
n-expanded versions of these programs should be expressible and type-checkable 
in SILLs+. As part of future work, we want to explore the addition of the linear 
forward 


Wt F wn < Wy 
P: D; y: A, [wm [22] F fwd zı y :: (a, : A, [w; Te?) 


(T-ID_) 


which allows forwarding to processes that are known to not yet be aliased and 
whose world annotations meet the premise Wt F wpn < wu. Restricting to pro- 
cesses in A should uphold Invariant 1, while the premise of the rule should uphold 
Invariant 2. However, this change will affect the inner working of the proofs, the 
use of inversion in particular, which might have far-reaching consequences that 
need to be carefully explored. 


Unbounded Process Networks and World Polymorphism. The typing 
discipline presented in the previous sections, while rich enough to account for 
a wide range of interesting programs, cannot type programs that spawn a stat- 
ically undetermined number of shared sessions that are then to be used. For 
instance, while we can easily type a configuration of any given number of dining 
philosophers (Sect. 3.3), we cannot type a recursive process in which the number 
of philosophers (and forks) is potentially unbounded (as done in [21,38]), due to 
the way worlds are created and propagated across processes. 

The general issue lies in implementing a statically unbounded network of pro- 
cesses that interact with each other. These interactions require the processes to 
be spawned at different worlds which must be generated dynamically as needed. 
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To interact with such a statically unknown number of processes uniformly, their 
offering channels must be stored in a list-like structure for later use. However, 
in our system, recursive types have to be invariant with respect to worlds. For 
instance, in a recursive type such as T = A, Qu, lir ®T, the worlds w, wp, Wr 
are fixed in the unfoldings of T. Thus, we cannot type a world-heterogeneous 
list and cannot form such process networks. 

Given that the issues preventing us from typing such unbounded networks 
lie in problems of world invariance, the natural solution is to explore some form 
of world polymorphism, where types can be parameterized by worlds which are 
instantiated at a later stage. Such techniques have been studied in the context of 
hybrid logical processes in [7] by considering session types of the form Vô.A and 
46.A, sessions that are parametric in the world variable ô, that is instantiated 
by a concrete reachable world at runtime. While their development cannot be 
mapped directly to our setting, it is a promising avenue of future work. 


7 Related Work 


Behavioral Type Analysis of Deadlocks. The addition of channel usage 
information to types in a concurrent, message-passing setting was pioneered by 
Kobayashi and Igarashi [30,34], who applied the idea to deadlock prevention 
in the z-calculus and later to more general properties [31,32], giving rise to a 
generic system that can be instantiated to produce a variety of concrete typing 
disciplines for the z-calculus (e.g., race detection, deadlock detection, etc.). 

This line of work types z-calculus processes with a simplified form of pro- 
cess (akin to CCS [42] terms without name restriction) that characterizes the 
input/output behavior of processes. These types are augmented with abstract 
data that pertain to the relative ordering of channel actions, with the type sys- 
tem ensuring that the transitive closure of such orderings forms a strict partial 
order, ensuring deadlock-freedom (i.e., communication succeeds unless a process 
diverges). Building on this, Kobayashi et al. proposed type systems that ensure 
a stronger property dubbed lock-freedom [35] (i-e., communication always suc- 
ceeds), and variants that are amenable to type inference [36,39]. Kobayashi [37] 
extended this latter system to more accurately account for recursive processes 
while preserving the existence of a type inference algorithm. 

Our system draws significant inspiration from this line of work, insofar as we 
also equip types with abstract ordering data on certain communication actions, 
which is then statically enforced to form a strict partial order. We note that 
our SILLs+ language differs sufficiently from the pure z-calculus in terms of its 
constructs and semantics to make the formulation of a direct comparison or an 
immediate application of their work unclear (e.g., [37] uses replication to encode 
recursive processes). Moreover, we integrate this style of order-based reasoning 
with both linear and shared session typing, which interact in non-trivial ways 
(especially in the presence of recursive types and recursive process definitions). 

In terms of typability, enforcing session fidelity can be a double-edged sword: 
some examples of the works above can be transposed to SILLs+ with mostly 
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cosmetic changes and without making use of shared sessions (e.g., a parallel 
implementation of factorial that recurses via replication but always answers on 
a private channel); others are incompatible with linear sessions and require the 
use of shared sessions via the acquire-release discipline, which entails a more 
indirect but still arguably faithful modelling of the original 7-calculus behavior; 
some examples, however, cannot be easily adapted to the shared session disci- 
pline (e.g., *c?(a, y).2?(z).y?(z) | xc? (x, y)-y?(z).x?(z) is typable in [37], where 
x?(z) denotes input on x and xc? (x, y) denotes replicated input) and their tran- 
scription, while possible, would be too far removed from the original term to 
be deemed a faithful representation. Recursive processes are known to produce 
patterns that can be challenging to analyze using such order-based techniques. 
The work of [21,38] specializes Kobayashi’s system to account for potentially 
unbounded process networks with non-trivial forms of sharing. Such systems are 
not typable in our work (see Sect. 6 for additional discussion on this topic). 

The work of Padovani [44] develops techniques inspired by [35,37] to develop 
a typing system for deadlock (and lock) freedom for the linear z-calculus where 
(linear) channels must be used exactly once. By enforcing this form of linearity, 
the resulting system uses only one piece of ordering data per channel usage and 
can easily integrate a form of channel polymorphism that accounts for intricate 
cyclic interleavings of recursive processes. The combination of manifest sharing 
and linear session typing does not seem possible without the use of additional 
ordering data, and the lack of single-use linear channels make the robust channel 
polymorphism of [44] not feasible in our setting. 

Dardha and Gay [15] recently integrated a system of Kobayashi-style order- 
ings in a logical session 7-calculus based on classical linear logic, extended with 
the ability to form cyclic dependencies of actions on linear session channels 
(Atkey et al. [1] study similar cycles but do not consider deadlock-freedom), 
without the need for new process constructs or an acquire-release discipline. 
Their work considers only a restricted form of replication common in linear logic- 
based works, not including recursive types nor recursive process definitions. This 
reduces the complexity of their system, at the cost of expressiveness. We also 
note that the cycles enabled by their system are produced by processes sharing 
multiple linear names. Since linearity is still enforced, they cannot represent the 
more general form of cycles that exploit shared channels, as we do. 

A comparative study of session typing and Kobayashi-style systems in terms 
of sharing was developed by Dardha and Pérez [16], showing that such order- 
based techniques can account for sharing in ways that are out of reach of both 
classical session typing and pure logic-based session typing. Our system (and 
that of [15]) aims to combine the heightened power of Kobayashi-style systems 
with the benefits of session typing, which seems to be better suited as a typing 
discipline for a high-level programming language [18]. 


Progress and Session Typing. To address limitations of classical binary ses- 
sion types, Honda et al. [27] introduced multiparty session types, where sessions 
are described by so-called global types that capture the interactions between 
an arbitrary number of session participants. Under some well-formedness 
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constraints, global types can be used to ensure that a collection of processes 
correctly implements the global behavior in a deadlock-free way. However, these 
global type-based approaches do not ensure deadlock freedom in the presence of 
higher-order channel passing or interleaved multiparty sessions. Coppo et al. [13] 
and Bettini et al. [6] develop systems that track usage orders among interleaved 
multiparty sessions, ruling out cyclic dependencies that can lead to deadlocks. 
The resulting system is quite intricate, since it combines the full multiparty ses- 
sion theory with the order tracking mechanism, interacts negatively with recur- 
sion (essentially disallowing interleaving with recursion) and, by tracking order 
at the multiparty session-level, ends up rejecting various benign configurations 
that can be accounted for by our more fine-grained analysis. We also highlight 
the analyses of Vieira and Vasconcelos [54] and Padovani et al. [45] that are more 
powerful than the approaches above, at the cost of a more complex analysis based 
on conversation types [10] (themselves a partial-order based technique). 


Static Analysis of Concurrent Programs. Lange et al. [40,41] develop a 
deadlock detection framework applied to the Go programming language. Their 
work distills CCS processes from programs which are then checked for deadlocks 
by a form of symbolic execution [40] and model-checked against modal ju-calculus 
formulae [41] which encode deadlock-freedom of the abstracted process (among 
other properties of interest). Their abstraction introduces some distance between 
the original program and the analysed process and so the analysis is sound only 
for certain restricted program fragments, excluding any combination of recursion 
and process spawning. Our direct approach does not suffer from this limitation. 

de’Liguoro and Padovani [17] develop a typing discipline for deadlock-freedom 
in a setting where processes exchange messages via unordered mailboxes. Their 
calculus subsumes the actor model and their analysis combines both so-called 
mailbox types and specialized dependency graphs to track potential cycles 
between mailboxes in actor-based systems. The unordered nature of actor-based 
communication introduces significant differences wrt our work, which crucially 
exploits the ordering of exchanged messages. 


8 Concluding Remarks 


In this paper we have developed the concept of manifest deadlock-freedom in 
the context of the language SILLs+, a shared session-typed language, showcasing 
both the programming methodology and the expressiveness of our framework 
with a series of examples. Deadlock-freedom of well-typed programs is estab- 
lished by a novel abstraction of so-called green and red arrows to reason about 
the interdependencies between processes in terms of linear and shared channel 
references. 

In future work, we plan to address some of the limitations of the interactions 
of deadlock-free shared sessions with recursion, by considering promising notions 
of world polymorphism and world communication. We also plan to study the 
problem of world inference and the inclusion of a linear forwarding construct. 
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Abstract. This paper introduces a new categorical structure that is a 
model of a variant of the i/o-typed z-calculus, in the same way that a 
cartesian closed category is a model of the A-calculus. To the best of 
our knowledge, no categorical model has been given for the i/o-typed 7m- 
calculus, in contrast to session-typed calculi, to which corresponding logic 
and categorical structure were given. The categorical structure intro- 
duced in this paper has a simple definition, combining two well-known 
structures, namely, closed Freyd category and compact closed category. 
The former is a model of effectful computation in a general setting, and 
the latter describes connections via channels, which cause the effect we 
focus on in this paper. To demonstrate the relevance of the categori- 
cal model, we show by a semantic consideration that the a-calculus is 
equivalent to a core calculus of Concurrent ML. 


Keywords: z-calculus - Categorical type theory - 
Compact closed category - Closed Freyd category 


1 Introduction 


The Curry-Howard-Lambek correspondence reveals the trinity of the simply- 
typed A-calculus, propositional intuitionistic logic and cartesian closed category. 
Via the correspondence, a type of the calculus can be seen as a formula of the 
logic, and as an object of a category; a term can be seen as a proof and as a 
morphism (see, e.g., [23]). Since its discovery, a number of variations have been 
proposed and studied. 

In concurrency theory, a correspondence between a process calculus and logic 
was established by Caires, Pfenning and Toninho [8,9] and later by Wadler [48]. 
What they found is that session types [18,20] can be seen as formulas of linear 
logic [14], and processes as proofs. This remarkable result has inspired lots of 
work (e.g. [3,4, 10, 25,45, 46]). 

This correspondence is, however, not completely satisfactory as pointed out 
in [8,26], as well as by Wadler himself [48]. The session-typed calculi in [9,48] cor- 
responding to linear logic have only well-behaved processes, because the session 
type systems guarantee deadlock-freedom and race-freedom of well-typed pro- 
cesses. This strong guarantee is often useful for programmers writing processes 
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in the typed calculus, but can be seen as a significant limitation of expressive 
power. For example, it prevents us from modelling wild concurrent systems or 
programs that might fall into deadlocks or race conditions. 

This paper describes an approach to a Curry-Howard-Lambek correspon- 
dence for concurrency in the presence of deadlocks and race conditions, from the 
viewpoint of categorical type theory. 


What Is the Categorical Model of the z-calculus? We focus on the m- 
calculus [30,31] in this paper. This is not only because the z-calculus is widely 
used and powerful, but also because of a classical result by Sangiorgi [39,42], 
which is the starting point of our development. 

Sangiorgi, in the early 90s, gave translations between the conventional, first- 
order z-calculus and its higher-order variant [39,42]. This translation allows us 
to regard the z-calculus as a higher-order programming language. 

Let us review the observation by Sangiorgi, using a core of the asynchronous 
n-calculus: P ::=0 | (P|Q) | a(x) | a(x).P.' The idea is to decompose the input- 
prefixing a(x).P into a and (x).P. Let us write a[(x).P] for a(x).P to emphasise 
the decomposition. Then a reduction can also be decomposed as 


a(x) | al(y)-P] |Q — [(y)-Pl(z)|Q — P{a/y} |Q, 


where the first step is the communication and the second step is the G-reduction 
(i.e. (Ay.P) x — P{a/y} in the A-calculus notation). Hence we regard 


— an output G(x) as an application of a function @ to x, and 
— an input a(x).P as an abstraction (x).P (or \x.P) “located” at a[—]. 


Now, ignoring the mysterious operator a[—], what we had are the core oper- 
ations of functional programming languages (i.e. abstraction and application). 
This functional programming language is effectful; in fact, communication via 
channels is a side effect. 

This observation leads us to base our categorical model for the 7-calculus 
on a model for effectful functional programs. Among several models, we choose 
closed Freyd category [37] for modelling the functional part. 

Then what is the categorical counterpart of a[—]? As this operation seems 
responsible for communication, this question can be rephrased as: what is the 
categorical structure for communication? An observation by Abramsky et al. [2] 
answered this question. They pointed out the importance of compact closed cate- 
gory [21] in concurrency theory, which nicely describes CCS-like processes inter- 
connected via ports. 

By combining the two structures described above, this paper introduces a 
categorical structure, which we call compact closed Freyd category, as a cate- 
gorical model of the z-calculus.? Despite its simplicity, compact closed Freyd 


1 This calculus slightly differs from the calculus we shall introduce in Sect. 2, but the 
differences are not important here. 

? Here is the reason why we do not use a monad for modelling the effect: it is unclear 
for us how to integrate a monad with the compact closed structure. On the contrary, 
a Freyd category has a (pre)monoidal category as its component; we can simply 
require that it is compact closed. 
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category captures the strong expressive power of the a-calculus. The compact 
closed structure allows us to connect ports in an arbitrary way, in return for the 
possibility of deadlocks; the Freyd structure allows us to duplicate objects, and 
duplication of input channels introduces the possibility of race conditions. 


Reconstructing Calculi. This paper introduces two calculi that are sound 
and complete with respect to the compact closed Freyd category model. One is 
a variant of the a-calculus, named mrp; the design of mp is based on the obser- 
vations described above. The other is a higher-order programming language Ach 
defined as an instance of the computational )-calculus [33]. Designing Acn is not 
so difficult because we can make use of the correspondence between computa- 
tional \-calculus and closed Freyd category (see Sect. 4). The A,,-calculus have 
operations for creating a channel and for sending a value via the channel and, 
therefore, can be seen as a core calculus of Concurrent ML (or CML) [38]. 
Since the higher-order calculus Acn and mp correspond to the same categor- 
ical model, we can obtain translations between these calculi by simple semantic 
computations. These translations are “correct by definition” and, interestingly, 
coincide with those between higher-order and first-order z-calculus [39, 42]. 


On 8- vs. Bn-theories. The categorical analysis of this paper reveals that 
many conventional behavioural equivalences for the z-calculus are problematic 
from a viewpoint of categorical type theory. The problem is that they induce 
only semicategories, which may not have identities for some objects. This is a 
reminiscent of the G-theory of the A-calculus, of which categorical model is given 
by semi-categorical notions [16]. 

Adding a single rule (which we call the 7-rule) resolves the problem. Our 
categorical type theory deals with only equivalences that admits the 7-rule, and 
the simplicity of the theory of this paper essentially relies on the 7-rule. 

Interestingly the 7-rule seems to explain some phenomenon in the literature. 
For example, Sangiorgi observed that a syntactic constraint called locality [28,49] 
is essential for his translation [39,42]. The correctness of the translation can be 
proved without using the 7-rule, when one restricts the calculus local; we expect 
that Sangiorgi’s observation can be related to this phenomenon. 


Contributions. This paper introduces a new variant of the i/o-typed r- 
calculus, which we call mp. A remarkable feature of mp is that it has a categorical 
counterpart, called compact closed Freyd category. The correspondence is fairly 
firm; the categorical semantics is sound and complete, and the term model is the 
classifying category. The relevance of the model is demonstrated by a semantic 
reconstruction of Sangiorgi’s translation [39,42]. These results open a new fron- 
tier in the Curry-Howard-Lambek correspondence for concurrency; session-type 
is not the only base for a Curry-Howard-Lambek correspondence for 7-calculi. 


Organisation of this Paper. Section 2 introduces the calculus 7p and discuss 
equivalences on processes. Section 3 gives the categorical semantics of mp and 
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shows soundness and completeness. A connection to a higher-order programming 
language with channels is studied in Sect. 4. In Sect. 5, we (1) discuss how our work 
relates to linear logic and (2) present some ideas for how to extend the applica- 
tion range of our model. We discuss related work in Sect. 6 and conclude in Sect. 7. 
Omitted proofs, as well as detailed definitions, are available in the full version. 


2 A Polyadic, Asynchronous 7-calculus with i/o-types 


This section introduces a variant of 7-calculus, named 7p. It is based on a fairly 
standard calculus, namely polyadic and asynchronous z-calculus with i/o-types, 
but the details are carefully designed so that mp has a categorical model. 


2.1 The zwp-calculus 


This subsection defines the calculus 77, which is based on an asynchronous vari- 
ant of the polyadic z-calculus with i/o-types in [35]. The aim of this subsection 
is to explain what are the differences from the conventional z-calculus. Although 
Tr has some uncommon features, each of them was studied in the literature; see 
Related Work (Sect.6) for related ideas and calculi. 


Types. The set of types, ranged over by S and T, is given by 
S,T ::=ch°[Ti,..., Ta] Pe ie) (n20). 


The type ch°[T;,..., Tn] is for output channels sending n arguments of types 
Ti,- -., Tn. The type ch'[T;,..., Ta] is for input channels. The dual T+ of type 


=. 


T is defined by ch°[T]+ L ch’ [T] and ch'[T|+ f ch? [T]. For a sequence T 2 
Ti, .-., Tn of types, we write T+ for Treat iat 

An important difference from [35] is that no channel allows both input and 
output operations. We will refer this feature of mp as i/o-separation. 


Processes. Let M be a denumerable set of names, ranged over by x, y and z. 
Each name is either input-only or output-only, because of i/o-separation. 
The set of processes, ranged over by P, Q and R, is defined by 


P,Q, R ::=0 | (P/Q) | Maney ey)P | (9) | 'e(y)-P. 


The notion of free names, as well as bound names, is defined as usual. The set 
of free names (resp. bound names) of P is written as fn(P) (resp. bn(P)). We 
allow tacit renaming of bound names, and identify a-equivalent processes. 

The meaning of the constructs should be clear, except for (ur xy) P which 
is less common. The process 0 is the inaction; P | Q is a parallel composition; 
x(y) is an output; and !2(Z).P is a replicated input. The restriction (vr xy) P 
hides the names x and y of type T and T+ and, at the same time, establishes a 
connection between x and y. Communication takes place only over bound names 
explicitly connected by v. This is in contrast to the conventional 7-calculus, in 
which input-output correspondence is a priori (i.e. @ is the output to a). 
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Pie P23 TFQ:0° T,x : ch°|[T], y : ch'[T] A P: o 
FEOS TFEP|Q:9 DE (Venojry tY)P : © 
(x: ch'|T]) Eer Dg:TKP:o (x: ch?[T]) e r grer 
Tb la(y).P:o Tray): 


Fig. 1. Typing rules for processes 


The ap-calculus does not have non-replicated input «x(7).P. 


Typing Rules. A type environment I is a finite sequence of type bindings of 
the form x: T. We assume the names in J’ are pairwise distinct. If # = £1,..., Zn 
and T = Ti,..., Tn, we write F: T for zı: Ti,..., £n: Tn. We write (Z: T\C Cr 
to mean zi: T; € I for every i. 

A type judgement is of the form I H P : o, meaning that P is a well-typed 
process under I’. The typing rules are listed in Fig. 1. 


Notation 1. We define (vip) TY)P as (Vener Y2)P; then (vr xy)P is defined 
for every T. We abbreviate (vr, 7141) ...(Y7, nYn)P as (vg Fy)P. We often 
omit type annotations and write (vry) for (vr xy) and (vay) for (vz ty). We 
use a and b for names of input channel types and & and b for output. Note that 
a and @ are connected only if they are bound by the same occurrence of v. 


Operational Semantics. Structural congruence, written =, is the smallest 
congruence relation on processes that satisfies the following rules: 


P|0=P_ PIQ=Q|P_ (P|Q)|R=P|(Q|R) 
(vzy)(P | Q) = ((vry)P)|Q — (vwa)(vyz)P = (vyz)(vwx)P 


where x,y ¢ fn(Q) in the fourth rule and w, x,y,z are distinct in the fifth rule. 
The reduction relation on processes, written —>, is defined by the base rule 


(vw2)(vaa)(!a(z).P | a(y) | Q) — (vz) (vaa)(!a(z).P | P{y/z} |Q) 


(where P{z/i} is the capture-avoiding substitution) and the structural rule 
which concludes P — Q from JP’ Q’. P = P! — Q' = Q. Note that, unlike 
conventional z-calculi, communication only occurs over bound names connected 
by v. We write —>* for the reflexive and transitive closure of —. 

It should be clear that deadlocks and racy communications can be expressed 
in mp. An example of race is (vaa)(a@(y) | !a(#).P | !a(2#).Q), where two input 
actions are trying to consume the output regarded as a resource. A similar 
process (vaa) (!a(Z).P | a(y) | @(Z)) does not have a race since the receiver !a(Z).P 
is replicated. In general, race conditions on output actions do not occur in Tp. 
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2.2 Equivalences on Processes 


To establish a Curry-Howard-Lambek correspondence is to find a nice alge- 
braic or categorical structure of terms. For example, the original Curry-Howard- 
Lambek correspondence reveals the cartesian closed structure of A-terms. 

Such a nice structure would become visible only when appropriate notions 
of composition and of equivalence could be identified, such as substitution and 
Bn-equivalence for the A-calculus. 

As for process calculi, so-called “parallel composition + hiding” paradigm [17] 
has been used to compose processes. Given typed processes 


z:T, g: SF P:0 and w: St, u:ÜFQ:o, 
their composite via (¥, w) is defined as 
#:T, ü: Üb (vgs ii)(P |Q): o. 


This kind of composition appears quite often in logical studies of m-calculi [1, 
5,19]. It also plays a central role in interaction category paradigm proposed by 
Abramsky, Gay and Nagarajan [2]. 

So it remains to determine an equivalence on 7-calculus processes, appropri- 
ate for our purpose. This subsection approaches the problem from two directions: 


— Examining behavioural equivalences proposed and studied in the literature 
— Developing a new equivalence based on categorical considerations 


Let us clarify the notion of equivalence discussed below. An equation-in- 
context is a judgement of the form + P = Q, where P+ P:oand IF Q:o. 
An equivalence € is a set of equations-in-context that is reflexive, transitive and 
symmetric (e.g. (IT F P = P) € € for every PF P: o). 


Behavioural Equivalences. As mentioned above, we are interested in the 
structure of 7-processes modulo existing behavioural equivalences. Among the 
various behavioural equivalence, we start with studying barbed congruence [32], 
which is one of the most widely used equivalences. 

We define (asynchronous and weak) barbed congruence for 7. For each name 
a, we write Pla if P = (vry)(a(Z) | Q) and Gis free, and Pla if JQ. P —* Qla. 
A (1'/A)-contezt is a context C such that + C[P]:¢ for every AF P: o. 


Definition 1. A barbed bisimulation is a symmetric relation R on processes 
such that, whenever P R Q, (1) Pla implies Q\ta and (2) P —> P’ implies 
IQ’. (Q —* Q') A (P' R Q'). Barbed bisimilarity & is the largest barbed bisim- 
ulation. Typed processes AF P : © and AF Q : o are barbed congruent at A, 
written Ab P ° Q, if C[P] & CIQ] for every (I'/A)-context C. 


Let us consider a category-like structure C in which an object is a type and 
a morphism is an equivalence class of 7r-processes modulo barbed congruence. 
More precisely, a morphism from T to S is a process x: T, y: St + P: o modulo 
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barbed congruence (and renaming of free names x and y). Then the composi- 
tion (i.e. “parallel composition + hiding”) is well-defined on equivalence classes, 
because barbed congruence is a congruence. This is a fairly natural setting. 

We have a strikingly negative result. 


Theorem 1. C is not a category. 


Proof. In every category, if f : A — A is a left-identity on A (i.e. fog = g for 
every g : A — A), then f is the identity on A. The process a : ch°[],b : chê[] + 
!a().b() : © seen as a morphism (ch°{]) — (ch°/]) is a left-identity but not the 
identity. The former means that c: ch°[], b: ch’[] H ((vaa)(!a().b() | P)) =° 
P{b/a} for every c: ch®{], @: ch’[] H P: o, which is a consequence of the repli- 
cator theorems [35]. To prove the latter, observe that (vbb)(!a().b() | 0) and O 
are not barbed congruent. Indeed the context C ee (vaa)(a() | !a().6() | []) 
distinguishes the processes, where 0 is the observable. 


Note that race condition is essential for the proof, specifically, for the part 
proving that the process !a().b() is not the identity. A race condition occurs in 
C|(vbb) (!a().b() | 0)], where ā in C has two receivers. 


The process !a().b() is called forwarder, and forwarders will play a central 


role in this paper. Its general form is a <> b S la(Z).b(z). When « : T and 
y: T+, we write x = y to mean z > y if T = ch'[S] and otherwise y > zx. 

Remark 1. The argument in the proof of Theorem 1 is widely applicable to i/o- 
typed calculi, not specific to mp. In particular, i/o-separation (i.e. absence of 
ch’/°[T]) is not the cause, but the existence of ch°[T}] or ch'[T] is. 


Remark 2. Session-typed calculi in Caires, Pfenning and Toninho [8,9], which 
correspond to linear logic, do not seem to suffer from this problem. In our under- 
standing, this is because of race-freedom of their calculi. 


To obtain a category, we should think of a coarser equivalence that identifies 
(vbb)(!a().b() | 0) with 0. Such an equivalence should be very coarse; even must- 
testing equivalence [11] fails to equate them. As long as we have checked, only 
may-testing equivalence [11] defined below satisfies the requirement. 


Definition 2. Typed processes A F P : o and A F Q: © are may-testing 
equivalent at A, written AF P =may Q, if C[P a © C[QNa for every (T/A)- 


context C and name a. 


As we shall see, 7-processes modulo may-testing equivalence behaves well. 
May-testing equivalence is, however, often too coarse. 


Category-Driven Approach. In this approach, we first guess an appropriate 
categorical structure sufficient for interpreting mp, based on intuitions discussed 
in Introduction (see also Sect.3.1), and then design an equivalence so that it is 
sound and complete with respect to the categorical semantics. 

Figure 2 defines the equivalence, described as a set of rules. A mp-theory is 
an equivalence that behaves well from the categorical perspective. 
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a ¢ fn(P,C) a ¢ bn(C) 


T (vaa)(!a(@).P | Cla(y)]) = (vaa)(!a(@).P | CLP{y7/z}]) (E-BETA) 
a a, a ¢ falez) 
Tr- (vaāa)!'a(y).P = 0 an Db &#) = (vaa) (a > b | z(z{a/b}) (E-FOUT) 
b,a ¢ fn(P) 
T į- (vaa)(b > ā | P) = P{b/a} (E-ETA) 
Sin (E-SConc) AFP=Q C: ['/A-context (E-Crx) 
A TE CP] = CQ] 


Fig. 2. Inference rules of equations-in-context. Each rule has implicit assumptions that 
the both sides of the equation are well-typed processes. 


Definition 3. An equivalence E is a np-theory if it is closed under the rules in 
Fig. 2. Any set Ax of equations-in-contezt has the minimum theory Th( Ax) that 
contains Ax. We write Arc> + P=Q if (T H P=Q)€ Th(Az). 


Let us examine each rule in Fig. 2. 
The rule (E-BETA) should be compared with the reduction relation. When 
C = ([] | Q), then (E-BETA) claims 


(vaa)(ta(z).P | ay) | Q) = (vaa) (!a(2).P | P{y/a} | Q) 


provided that a ¢ fn(P, Q), which is indeed an instance of the reduction. 

A significant difference from reduction is the side condition. It is essential 
in the presence of race conditions. Without the side condition, every 7r-theory 
would be forced to contain the symmetric and transitive closure of the reduction 
relation; thus it would identify P | (vaa)(!a().P | !a().Q) with Q | (vaa)(!a().P | 
!a().Q) for every processes P and Q (where G, a are fresh), because 


(vaa)(a{) | !a().P | !a().Q) — P| (vaa)(ta().P | !a().Q) 
(vaa)(a{) | !a().P | !a().Q) — Q| (vaa)(!a().P | !a).Q). 


The side condition prevents 7p-theories from collapsing. 

Another, relatively minor, difference is that application of (E-BETA) is not 
limited to the contexts of the form [] | Q. This kind of extension can be found in, 
for example, work by Honda and Laurent [19] studying z-calculus from a logical 
perspective. 

The rule (E-GC) runs “garbage-collection”. Because no one can send a mes- 
sage to the hidden name a, the process !a(#).P will never be invoked and thus 
is safely discarded. This rule is sound with respect to many behavioural equiv- 
alences, including barbed congruence. Rules of this kind often appear in the 
literature studying logical aspects of concurrent calculi (as in Honda and Lau- 
rent [19] and Wadler [48]). There is, however, a subtle difference in the side 
condition: (E-GC) requires that a and @ do not appear at all in P. 
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The rule (E-FOUT) can be seen as the 7-rule of abstractions, as in the À- 
calculus and in the higher-order z-calculus [39]. In the latter, an output name b 
can be identified with an abstraction (¥).b(¥). Then we have, for example, 


(vaa)(a > b| e(a)) = (vaa)(a > b| e(y).a(H))) = b) = Eb) 


where we use (E-BETA) and (E-GC) in the second step. An important usage 
of (E-FOUT) is to replace an output of free names with that of bound names. 
This kind of operation has been studied in [7,28] as a part of translations from 
the z-calculus to its local/internal fragments.’ 

The rule (E-ETA) requires the forwarders are left-identities, directly describ- 
ing the requirement discussed above.* 

The rules (E-SConc) and (E-CTX) are easy to understand. The former 
requires that structurally congruent processes should be identified; the latter 
says that a mp-theory is a congruence. 

These rules can be justified from the operational viewpoint, as well. A well- 
known result on the i/o-typed z-calculus (see, e.g., [35,43]) shows the following 
propositions. 


Proposition 1. Barbed congruence is closed under all rules but (E-ETA). 


Proposition 2. May-testing equivalence is a mp-theory. 


In particular, the latter means that may-testing equivalence is in the scope of 
the categorical framework of this paper; see Theorem 5. 


3 Categorical Semantics 


This section introduces the class of compact closed Freyd categories and discusses 
the interpretation of the mp-calculus in the categories. We show that the cate- 
gorical semantics is sound and complete with respect to the equational theory 
given in Sect. 2.2, and that the syntax of the 7p-calculus induces a model. 

This section, by its nature, is slightly theoretical compared with other sec- 
tions. Section 3.1 explains the ideas of this section without heavily using cate- 
gorical notions; the subsequent subsections require familiarity with categorical 
type theory. 


3.1 Overview 


As mentioned in Sect. 1, the categorical model of mp is compact closed Freyd 
category, which has both closed Freyd and compact closed structures. Here we 


3 Free outputs can be eliminated from 7r-processes by using the rules (E-FOUT) and 
(E-ETA), i.e. external mobility can be encoded by internal mobility [7,40]. If the 
calculus is local [28,49], then we do not need (E-ETA) to eliminate free outputs. 

4 A forwarder behaves as a right-identity with respect to every mr-theory. This is a 
consequence of rules (E-BETA), (E-GC) and (E-FOurt). 
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informally discuss what is a compact closed Freyd category and how to interpret 
Tr by using syntactic representation. 

A closed Freyd category is a model of higher-order programs with side effects. 
It has, among others, the structures to interpret the function type A > B and 
its constructor and destructor, namely, abstraction Ax.t and application tu. It 
also has a mechanism for unrestricted duplication of variables; in terms of logic, 
contraction is admissible. 

A compact closed category can be seen as MLL [14] with the left rule: 


T, A*, AHI rRA* AFA 
FET T, AFI 


(The right rule is the companion, which itself is derivable in MLL.) 
A compact closed Freyd category has all the constructs. It has the structures 
corresponding to the following type constructors: 


(closed Freyd) [,A®B,A>B (compact closed) I, A&B, A*. 


Note that the pair type A ® B (as well as the unit J) coming from the closed 
Freyd structure is identified with that from the compact closed structure. Infer- 
ence rules for a compact closed Freyd category is those for functional languages 
and the above rules of the compact closed structure. 

Interpreting 7mp in a compact closed Freyd category is to interpret it by using 
these constructs. As mentioned in Sect. 1, following Sangiorgi [39], we regard 


— an output @(Z) as an application of a function @ to a tuple (Z), and 
— an input !a(#).P as an abstraction (Z).P (or AZ.P) located at a. 


We interpret the output action by using the function application. Hence the type 
ch°|T] is regarded as a function type T = I (where the unit type J is the type 
for processes i.e. ©); then the typing rule for output actions becomes 


IT,a: (T= I)xz:THa:T >I Ta: (T= I),s:THx:T 
T,a: (T => I),x:T Hals): I 


The type ch’[T] is understood as (T => I)*; the input-prefixing rule becomes 
IT,a: (T >I), ,x:THFP:I 


T,a: (T => I) Ha: (T > I)* T,a: (T => I) H (x) P:T =I 
T,a: (T => I)* F- la(x).P: I 


This derivation directly expresses the intuition that an input-prefixing is abstrac- 
tion followed by allocation; here allocation is interpreted by using the compact 
closed structure, i.e. connection of ports. The name restriction also has a natural 
derivation: 


Tra: (T => I)*,a: (T >I)FP:I 
rF (vaa)P:I 
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3.2 Compact Closed Freyd Category 


Let us formalise the ideas given in Sect. 3.1. Hereafter in this section, we assume 
basic knowledge of category theory and of categorical type theory. 

We recall the definitions of compact closed category and closed Freyd cat- 
egory. For simplicity, the structures below are strict and chosen; a functor is 
required to preserve the chosen structures on the nose. 


Definition 4 (Compact closed category [21]). Let (C,®,I) be a symmetric 
strict monoidal category. The dual of an object A in C is an object A* equipped 
with unit na: I — A®A* and counit c4: A*®A — I that satisfy the “triangle 
identities” (na Qida); Gda ®e€a) = id, and (ida 8na); (ca @idg+) = ida». The 
category C is compact closed if each object is equipped with a chosen dual. 


Definition 5 (Closed Freyd category [37]). A Freyd category is given by 
(1) a category with chosen finite products (C,@,I), called value category, (2) a 
symmetric strict monoidal category (K,®,I,symm), called producer category, 
and (8) an identity-on-object strict symmetric monoidal functor J: C > K. A 
Freyd category is a closed Freyd category if the functor J(—) @ A: C > K has 
the (chosen) right adjoint A = —: K —> C for every object A. We write AA, B,C 
for the natural bijection K(J(A) ® B,C) — C(A,B => C) and eval, p for 
A\(idasp): (A> B)@A— Bink. 


Remark 3. The above definition is a restriction of the original one [37], in which 
K is a premonoidal [36] category. This change reflects concurrency of the cal- 
culus. In fact, it validates the following law, expressed by the syntax of the 
computational -calculus [33], 


letx = M inlety = NinL = lety=Ninletx = MinL. 


Then one can evaluate M by using the left form and N by using the right form. 
This law allows us to evaluate M and N in arbitrary order, or concurrently. 


We now introduce the categorical structure corresponding to the 7-calculus. 


Definition 6 (Compact closed Freyd category). A compact closed Freyd 
category is a Freyd category J : C — K such that (1) K is compact closed, and 
(2) J has the (chosen) right adjoint I => —: K > C. 


We shall often write J for a compact closed Freyd category J: CZEK. 
A compact closed Freyd category is a closed Freyd category: 


K(J(A) ® B,C) © K(J(A), B* @C) = C(A,I > (B* @C)). 


Example 1. The most basic example of a compact closed Freyd category is (the 
strict monoidal version of) J: Sets Rel: P. Here J is the identity-on-object 
functor that maps a function to its graph and P is the “power set functor” 
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[ch‘[Ti,...,Tn]] © (([Ti] ®--- @ [Tr]) > D 
[ch°[T11,..., Tr] = (MJ @---@[)) +1 


[PF 0:0 = J(!r) 
[PF la(2).P : of 2 JUa, App (I: TE Pol); eontr 
[rb ala) : 0] É (iad at... 02); evalp y 
[PE P|Q:o] © J(Ar); (IPE P: 0] @ [PE Q: oJ) 


[Ck vey)P: 0] © Gdr @ nr); [L£ :T,y: Tt P: 9] 


Fig. 3. Interpretation of types and processes. Here !r, Ar and T are maps in C 
induced by the cartesian structure, namely, !r: [I] —> J is the terminal map, 
Ar: [T] — KF] ®@ [/] is the diagonal map and, when F = (y1: Ti,..., Yn: Tn) 
and a = yj, the morphism 72: [I] — [T;] is the a -th projection. The interpretation 
of a type environment 21: T1,..., ‚£n: Ty is [Ti] ®---® [Tr]. 

that maps a relation R C A x B to a function P(R) we {(S4,SB) | SB = 
{b | a € S4,a R b}}. Another example is obtained by replacing sets with 
posets, functions with monotone functions and relations with downward closed 
relations. 


Example 2. A more sophisticated example is taken from Laird’s game-semantic 
model of z-calculus [22]. Precisely speaking, the model in [22] itself is not com- 
pact closed Freyd, but its variant (with non-negative arenas) is. This model is 
important since it is fully abstract w.r.t. may-testing equivalence [22, Theorem 1]; 
hence our framework has a model that captures the may-testing equivalence. 


3.3 Interpretation 


Given a compact closed Freyd category J: CYK, this section defines the inter- 
pretation [—],7. It maps types and type environments to objects as usual, and a 
well-typed process I'A P : © to a morphism [|P]: [T] — J in K (recall that the 
tensor unit I is the interpretation of the type for processes). 

Figure 3 defines the interpretation of types and processes. It simply formalises 
the ideas presented in Sect. 3.1: for example, the interpretation of !a(2).P is the 
abstraction A (from the closed Freyd structure) followed by location e (from the 
compact closed structure). There are some points worth noting. 


- (A= I)* is not isomorphic to A* > I, A > I nor I => A. Indeed (A => I)* 
cannot be simplified. Do not confuse it with a valid law I > (A*) = A > I. 

— A parallel composition is interpreted as a pair. Recall that two components 
of a pair are evaluated in parallel in this setting (cf. Remark 3). 

— All but the last rule use the cartesian structure of C in order to duplicate or 
discard the environment. 
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Example 3. Let us consider y : T H (vaa)(a(y) | !a(x).P) : o, where a,a,y ¢ 
fn(P) and a: ch'[T]. By (E-BETA) and (E-GC), this process is equal to P{y/z}. 
It is natural to expect that the interpretations of the two processes coincide; 
indeed it is. As the following calculation indicates, our semantics factorises the 
reduction into two steps: (1) the “transmission” of the closure A%.P by the tri- 
angle identity of the compact closed structure, and (2) the 6-reduction modelled 
by eval of the closed Freyd structure: 


[v : TE (va) (ay) | ta(2)-P) : o] 

(Gdr 8 nene(r}); ly: T, a : ch°[T], a : ch’[T] + aly) | !a(x).P : o] 

= (id @ n); (ly: T,ā : ch°[T] + Gly) : o] & fa: ch’[T] H !a(x).P : of) 

(id 8n); ((Symmr ene{r]; evalr,z) 8 (idenjr ® J(A( [x :T F P: 0))));erar 
( 

=(J 


II 


II 


idr 8 J(A([z: T F P : o])));symmr oner]; evalr,; (By triangle identity) 
(A(z: TF P :¢])) 8 idr); evalr,z 

=ļ|x:TF P] (By the universality of eval) 

=[y: TH P{y/2} : o]. 


(Here we implicitly use derived rules for weakening and exchange.) 


Example 4. The interpretation of a forwarder a : ch'[T],b: ch°[T] F a > b: o is 
the counit €4,o/7): [ch° [7 ['|]* @ [ch [T]] — I in K, which is the one-sided form 
of the identity. Recall that a forwarder is the identity in every 7p-theory. 


The semantics is sound and complete. That means, a judgement Ar > I’ + 
P = Q is provable if and only if [+ P = Q is valid in all models J of Az. 

Here we define the related notions and prove soundness; completeness is the 
topic of the next subsection. 


Definition 7. An equational judgement [+ P = Q is valid in J if [CF P: 
oy = [FF Q: oy. Given a set Ax of non-logical axioms, J is a model of Ax, 
written J = Az, if it validates all judgements in Ax. We write Ar>I' IF- P=Q 
if 0 P= Q is valid in every J such that J — Az. 


Theorem 2 (Soundness). If Ax > r H P=Q, then Ato r P=Q. 


3.4 Term Model 


A term model is a category whose objects are type environments and whose mor- 
phisms are terms (i.e. processes in this setting). This section gives a construction 
of the term model, by which we show completeness. This subsection basically 
follows the standard arguments in categorical type theory; we mainly focus on 
the features unique to our model, giving a sketch to the common part. 

Given a set Ax of axioms, we define the term model JAz: CAs K ax, which 
we also write as Cl(Az). 
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The definition of the producer category K4, follows the standard recipe. 
As usual, its objects are finite lists of types. The monoidal product Tos is 
the concatenation of the lists and the dual T* is T+. Given objects T and Š ; 
a morphism from T to Š isa process 2: T, Į: SLHP:o (modulo renaming 
of variables 7 and 7). If Ax > Z: T, 7: St + P = Q is provable, then P and 
Q are regarded as the same morphism. Composition of f morphisms i is defined as 

“parallel composition plus hiding”: For morphisms P : T — Š and Q: S- Ü, 
i.e. processes such that 7: T, g: S++ P: oand Z: 9, w: UL FQ : o, their 
composite is #: T, w: U+ + (vgz)(P | Q) : o. The monoidal product P & Q 
of morphisms is the parallel composition P | Q. The identity, as well as the 
symmetry of the monoidal product and the unit and counit of the compact closed 
structure, _is a parallel oe of forwarders: for example, the identity on 
Sist:S,97: S++ a, Sy |---| En Syn : © where n is the length of J. 
The DiR that most structural ae are forwarders and that forwarders 
compose are the keys to show that K4, is a compact closed category. 

We then see the definition of C4,, of which the definition of morphisms has 
a subtle point. The objects of C4, are by definition the same as K4z,, i.e. lists 
of types. The definition of morphisms relies on the notion of values. The values 
are defined by the grammar V ::= <x | (#).P, where P is a process and (£).P is 
called an abstraction. Typing rules for values are as follows: 


e:Ter T,z: THP 
Pre:T Pe (#).P:ch[T] 


(To understand the right rule, recall that [ch°{T}] = [T] > I.) A morphism 
from T to § = (S1,...,Sn) is an n-tuple (V1, ..., Van) of values of type 7: Th 
V;: S; for each i (modulo renaming of 7). Composition is intuitively defined by 
“substitution followed by 3-reduction” whose definition is omitted here.” 

The functor J4, places the values to the channels. For example, let T = 
(ch‘[U;], ch°[U2]) and consider the morphism in C4, given by 


= 


a: ch'[T;], b: ch? [T2] + (a,b, (@).P) : (ch [T], ch? [To], ch°[S]) 
where Š is the type for Z. The image of this morphism by the functor J4, is 


a: ch'[T)], b: ch° [To], €: ch? [T;], d: ch*[T], e: ch’[$] H a >z |d b| !e(Z).P: o. 


This example contains all the three ways to place a value to a given channel. 


Theorem 3. Cl(Ax) is a compact closed Freyd category for every Az. 


In the model Cl(Az), the interpretation of a process + P : o is the equiv- 
alence class that P belongs to. This fact leads to completeness. 


5 Here is a subtle technical issue that we shall not address in this paper; see the long 
version for the formal definition. We think, however, that this paragraph conveys a 
precise intuition. 
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Theorem 4 (Completeness). If Ax > T lk P =Q, then Ac >TF P=Q. 


Theorem 5. There exists a compact closed Freyd category J that is fully 
abstract w.r.t. may-testing equivalence, i.e. IT- P=may Q iff [Pls = [Q] 


Proof. Let J be the term model Cl(= may) and use Proposition 2. 


3.5 Theory/Model Correspondence 


It is natural to expect that Cl(Az) is the classifying category as in the standard 
categorical type theory. This means, to give a model of Az in J is equivalent to 
give a structure-preserving functor Cl(Ax) — J. This subsection clarifies and 
studies this claim. 
The set Mod(Az, J) of models of Az in J is defined as follows. If J = Az, 
then Mod(Az, J) is a singleton set®; otherwise Mod(Az, J) is the empty set. 
We then define the notion of structure-preserving functors. 


Definition 8. A strict compact closed Freyd functor from J: CZEK: I => (—) 
to J’: CZEK: I =" (—) is a pair of functor (,V) such that 


- @ is a strict finite product preserving functor from C to C’, 

—~W is a strict symmetric monoidal functor from K to K’ that preserves the 
chosen compact closed structures (i.e. units and counits) on the nose, and 

- (, V) is a map of adjoints between J 4 I = (—) and J' 1 I = (-). 


The collection of (small) compact closed Freyd categories and strict compact 
closed Freyd functors form a 1-category, which we write as CCFC. 
? 


Now the question is whether Mod(Az, J) * CCFC(Cl(Az), J) in Set. 

Unfortunately this does not hold. More precisely, the left-to-right inclusion 
does not hold in general. This means that the term model satisfies some addi- 
tional axioms reflecting some aspects of the mr-calculus. 

The additional axioms reflect the definition of the dual T* in the term model; 
we have T* 4 TŁ by definition, and thus T** = T and (T 9 S)* = T* @ S*. 
It might be surprising that these equations are harmful because isomorphisms 
A** = A and (A® B)* S A* ® B* exist in every compact closed category. The 
point is that the equations also require C to have isomorphisms A** © A and 
(A 8 B)* = A* ® B* (witnessed by the respective identities). 

We formally define the additional axioms, which we call (I) and (D): 


(I) The canonical isomorphism A** — A in K is the identity. 
(D) The canonical isomorphism (A ® B)* — A* ® B* in K is the identity. 


Theorem 6. Mod(Az, J) = CCFC(Cl(Az), J) if J satisfies (I) and (D). 


6 Because we consider only the empty signature, the set of valuations is singleton. 
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CLS 7 4-7 En=o PF BS Eanan) E= o 
V =g] AZ).M Vi: 
M :=(V)|V (V) | let (2) = M in M’ 


---| channel, | send, 


(a) Ac (b) Acn (difference from Ae) 


Fig. 4. Syntax of types and terms of the Ac- and Acn-calculi. The syntax of Ae is adapted 
to the setting of this paper. 


4 A Concurrent A-calculus and (de)compilation 


In order to demonstrate the relevance of our semantic framework, this section 
tries to give a semantic reconstruction of fully-abstract compilation and decompi- 
lation from a higher-order calculus to the (first-order) m-calculus, such as [39,42]. 
We first design an instance of the computational A-calculus [33], named Acn, that 
is sound and complete with respect to compact closed Freyd categories. It is 
obtained by a straightforward extension of the coincidence between the compu- 
tational A-calculus and closed Freyd categories (Sect. 4.1). There are translations 
between wr and àen since both are sound and complete with respect to com- 
pact closed Freyd categories. Section 4.2 actually calculates the translations, and 
compare them with those in [39,42]. 


4.1 The A,,-calculus 


The Aen-calculus is a computational A-calculus with additional constructors deal- 
ing with channels. This section introduces and explains the calculus. 
The situation is nicely expressed by the following intuitive equation: 


Ach _, (compact closed Freyd category + I + D) 


mT (closed Freyd category) 


The base calculus Ac is the computational A-calculus, which corresponds to closed 
Freyd category [33,37]. It is a call-by-value higher-order programming language, 
given in Fig. 4(a). Our calculus Aep is obtained by adding type and term con- 
structors originating from the compact closed structure, which Ae does not have. 


Syntax. As for types, Acn has a new constructor coming from the dual object 
A*. Normalising occurrences of the dual A* using the axioms (I) A** = A and 
(D) (A 8 B)* = A* ® B*, we obtain the following grammar of types: 


ci=ToOT E= |o” T= (Cissczabn) 


where n > 0 and (&,...,€,) is an alternative notation for £,@---@£&,. Compared 
with Ae, the only new type is the dual type o* of a function type ø. 
As for terms, Aen has constructors corresponding to the unit and counit 


na: I —A® A* ca : A* @A— I (for each object A) 
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of the compact closed structure. We simply add these morphisms as constants: 


I’ channel, : () > (0,0*) = I F send, : (o*,0) > () 


We shall often omit the subscript o. 

In summary, we obtain the syntax of Acn shown in Fig. 4. Interestingly, Ach 
can be seen as a very core of Concurrent ML [38], a practical higher-order concur- 
rent language, although Acn is developed from purely semantic considerations. 


Semantics. Let us first discuss the intuitive meanings of the new constructors. 
The type o* is for output channels; channel () creates and returns a pair of an 
input channel and an output channel that are connected; and send (a, V) sends 
the value V via the output channel a. The following points are worth noting. 


— Ach has no type constructor for input channels. The type system does not 
distinguish between input channels for type o and values of type ø. 

— Acn has no receive constructor. Receiving operation is implicit and on demand, 
delayed as much as possible. 

— The send operator broadcasts a value via a channel. Several receivers may 
receive the same value from the same channel. 


The first two points reflect the asynchrony of mp, and the last point reflects the 
absence of non-replicated input (cf. Sect. 4.2). 

Based on this intuition, we develop the operational, axiomatic and categorical 
semantics of Acn. We shall use the following abbreviations: 


(vxy)M f Jet (x,y) = channel () in M M||N f let ()=MinN. 


Operational Semantics. Assume an infinite set X of channels, ranged over by a 
and 8. For each channel a, we write a for the input name and @ for the output 
name, both of which are values. A configuration is a tuple (M,@, u) of a term 
M, a sequence @ of generated channels and a sequence u of performed send 
operations, i.e. u = (send (3;,Vi),...,send (8k, Vk)). The reduction relation is 
defined by the following rules for channels 


(B[channel ()], a, u) — (ELA, B), & 8, u) (8 £a) 
(Elsend (5, V}], a, u) — (E[()], &, p send (8,V)) 
(EIE V], &, u) — (E[WV], a, u) (send (3, W) € p). 


in addition to the standard rules for -abstractions and let-expressions, which 
change only M. Here the set of evaluation contexts is given by the grammar: 


E::=|]| | let (7) = E in M | let (2) = M in E. 


Note that M and N in let (7) = M in N are evaluated in parallel (cf. Remark 3). 
This justifies the notation M || N, an abbreviation for let () = M in N. 
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Axiomatic Semantics. The inference rules of the equational logic for Aen are 
those for Ae with the rule of concurrent evaluation 


let (¢) = M inlet (yj) = NinL = let (jy) = Ninlet (z) = Min L; 
the 8- and 7-rules for channels 


(vxz)(send (z, V} || M) = (vzī)(send (z,V) || M{V/zr}) 
(vyy)(send (2,y) || N) = N{z/y} 


where  ¢ Fv(V) U Fv(M), y ¢ Fv(N) and Z 4 g; and a GC rule. 


Categorical Semantics. One can interpret Acn-terms in a compact closed Freyd 
category with (I) and (D). The interpretation of the Ae-calculus part is stan- 
dard [24,37]; the constant channel, (resp. send, ) is interpreted as the “closure” 
whose body is no (resp. €s) as expected. 
def 
[+ channel, : () > (¢,0*)] = J(!r; Artc@o+(No)) 
" def 
[I F send, : (o*,o) > O] = J(!r; Ar og0o*.1(€o)). 

The categorical semantics is sound and complete with respect to the equa- 
tional theory of the Aen-calculus. The proofs are basically straightforward but 
there is a subtle issue in the definition of the term model: we have different def- 
initions of the right adjoint J = (—), which are of course equivalent but do not 


coincide on the nose. Our choice here is I > (E) qf (E = 0: 


4.2 Translations Between Aen and mp 


The higher-order calculus Acn is equivalent to mpr. This is because both calculi 
correspond to the same class of categories, namely, the class of compact closed 
Freyd categories with (I) and (D), i.e., 


(Ach) ~ (compact closed Freyd category + I + D) ~ (rp). 


This subsection studies translations derived from this semantic correspondence. 

The translations are defined by the interpretations in the term models. For 
example, the translation (—) from Acn to mp is induced by the interpretation 
of Acn-terms in the term model CI(Ø). The interpretation [M] co) of a Acn- 
term M is an equivalence class of 7-processes, since a morphism in Cl(@) is an 
equivalence class of 7p-processes. The translation (M) is defined by choosing a 
representative of the equivalence class. The other direction {—] is obtained by 
the interpretation of mp in the term model of Aen- 

Figures 5 and 6 are concrete definitions of the translations for a natural choice 
of representatives. Let us discuss the translations in more details. 

The translation from mp to Aen (Fig.5) is easy to understand. It directly 
expresses the higher-order view of the first-order z-calculus. For example, an 
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def 


[hE (+30 [eb FS CF)+0)" E. m)) FS (NH)... (HY) 
(OJ) [Pl Q)S [P] Q]  [(vey)P]¥ (vey) [P} 


def 


(a(z)} 2 a(z) — (!a(Z).P) © send (a, \(Z).{P)) 


Fig. 5. Translation from apr to Ach 


(ri > ra) S ch°[¢r), (rD) do") E (od W.-H) FS (Urn), -- (rnd) 
(dp ZF (psx) VEM)p = p(z, 8) (M) UDe = Vidor |---| (Vaden 
(V (Wha = (vaa)(VF3)((V)a | ((W))s | (7, p) 


(let (@) = Min N)s = (vzq)((M)z | WN) a) 


(channel), a 'p(x,y).c > y (send), df lp(x,y).£ > y 
Fig. 6. Translation from Ach to TF 


output action is mapped to an application and an input-prefixing !a(Z).P to a 
send operation of the value \(Z).P via the channel a. 

An interesting (and perhaps confusing) phenomenon is that an input channel 
in wr is mapped to an output channel in Acn. This can be explained as follows. 
In the name-passing viewpoint, the reduction 


(vy)(!y(Z).P | x(u)) —> (vary) (ty(Z)-P | P{u/z}) 


sends t to the process !y(Z).P, and thus x is output and y is input. In the 
process-passing viewpoint, the abstraction (Z).P is sent to the location of x, and 
thus y is the output and z is the input. 

Next, we explain the translation from Acn to mp (Fig. 6). 

Let us first examine the translation of types. The most non-trivial part is 
the translation of a function type Tı — T2. A key to understand the translation 
is the isomorphism 7; > T2 & Tı Q T+ — (). The latter form of function type 
corresponds to an output channel type in mp. Hence a function is understood as 
a process additionally taking channels to which the return values are passed. 

The translation (M)p of a Acn-term TH M : (&1,...,&,) takes extra param- 
eters P = p1,.-.-,Pn to which the values should be placed. This is a consequence 
of the definition in the 7p-term model that a morphism T — isa process 
T: T, y: SL- P: o. Here p corresponds to ¥, I to £: T and Eto 5. 

Now it is not so difficult to understand the interpretations of constructs in the 
Ac-calculus. For example, the abstraction (A(Z).M), is mapped to an abstraction 
(z, @).(M!)¢ placed at p, which takes additional channels g to which the results 
of the evaluation of M should be sent. 

It might be surprising that the interpretations of channel and send coincide. 
This is because of the one-sided formulation of mp. In the two-sided formula- 
tion, the unit 7 and counit € of the compact closed structure, corresponding to 
channel and send, can be written as logical inference rules 
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(pO (PQ E (PII) = (waxy) P) ZF wry) (P) (leu = We 
(u(wi,-..,Wn)) Æ (vaa)(vb1b1) ... (brbn) (ha | (wide, |- | (wndon | albis., Bn)) 


def 


(c)o 2 (aoa) ((Z)-P)a & !a(Z).(P) 
Fig. 7. Translation from AHOz to tF 


rAAtKA q wees 
TFA an TFA '’ 


which are different. In the one-sided formulation, however, they become 


T, A, A+, At H 
T, AF 


Hence 7 and e (or channel and send) cannot be distinguished in 7p. 

The translation (—) must be the inverse of {—] because both the term models 
are the initial compact closed Freyd category with (I) and (D). That means, 
0> rF P= ({P)) and Øœ rH M = [(M)) are provable for every P and M. 
This result is independent of the choice of representatives. 


4.3 Relation to Other Calculi and Translations 


A number of higher-order concurrent calculi, as well as their translations to the 
first-order 7-calculus, have been proposed and studied (e.g. [29,39,40, 42, 45,47]). 
The calculus cp and the translations have a lot of ideas in common with those 
calculi and translations; see Sect. 6. 

This subsection mainly discusses the relationship to the translations by San- 
giorgi [42] (see also [43]) between asynchronous higher-order 1-calculus (AHOT 
for short) and asynchronous local m-calculus (Lr for short). Here we focus on 
this work because it is closest to ours. We shall see that our semantic or cat- 
egorical development provides us with a semantic reconstruction of Sangiorgi’s 
translations, as well as an extension. 

A variant of AHOz can be seen as a fragment of Acn. The syntax of processes 
of AHOz and representation by \-,-terms are given as follow: 


v,w ::=x | (Z).P P,Q :=0|(P|Q)| (vzry)P | lav | v(w 
x X#).P O P\l|Q (vry)P send (x,v) v (w). 


wa 


(It slightly differs from the original syntax, as v binds a pair of names.) 
This fragment is nicely described as the limitation on types: 


o ::= (8) > () Ex=a|o* r=): 


Recall that ø is a type for abstractions, € is a type for variables, and 7 is a type 
for terms. This limitation means that (1) an abstraction cannot take a channel 
as an argument, and (2) a term M must be of the unit type, i.e. a process. 


660 K. Sakayori and T. Tsukada 


Once regarding AHO7z as a fragment of Acn, the translation from AHOz to 
Tr is obtained by restricting (—) to AHO7. The resulting translation is in Fig. 7. 
As mentioned, the translation is the same as that of Sangiorgi [42] except for 
minor differences due to the slight change of the syntax. 

Sangiorgi also gave a translation in the opposite direction, from Lr to AHO 
in the same paper. The calculus Lr is a fragment of the 7-calculus in which only 
output channels can be passed. The i/o-separation of 7 allows us to characterise 
the local version of 7 by a limitation on types. In the local variant, the output 
channel type is restricted to T ::= ch’ (T, expressing that only output channels 
can be passed via an output channel. Then the definition of type environment 
should be changed accordingly: I ::=- | x: T | x: T+ (since the syntactic class 
represented by T is not closed under the dual (—)+ in the local setting). 

Interestingly the limitation on types in AHOz coincides with that in Lr, 
when one identify ch°[T] with (T) — () (as we have done in many places). In 
other words, the syntactic restrictions of AHOz and Lr are the same semantic 
conditions described in different syntax. As a consequence, the image of Lr by 
(—] is indeed in AHOz. 


Remark 4. There is, however, a notable difference from Sangiorgi’s work [42]. 
Sangiorgi proved that the translation is fully-abstract with respect to barbed 
congruence; in contrast, we only show that = M = N iff F (M) = (N). In 
particular, the 7-rule is inevitable for our argument. The presence of the n- 
rules significantly simplifies the argument, at the cost of operational justification 
(recall that the 7-rule is not sound with respect to barbed congruence). 

It is natural to ask how one can reconstruct the full-abstraction result with 
respect to barbed congruence. An interesting observation is that, if M and N 
are AHOz processes, then FO M = N iff HO (M) = (N), where HO means prov- 
ability without using 7-rules. We expect that this semantic observation explains 
why locality is essential as noted in [42]; we leave the details for future work. 


5 Discussions 


Connection to Logics. We have so far studied a connection between compact 
closed Freyd category and z-calculus. Here we briefly discuss the missing piece 
of the Curry-Howard-Lambek correspondence, namely logic. 

The model of this paper is closely related to linear logic. Actually, every 
compact closed Freyd category is a model of linear logic (more precisely, MELL), 
as an instance of linear-non-linear model [6] (see, e.g., [27] for categorical models 
of linear logic). The interpretation of formulas is shown in Table1. It differs 
from the translations by Abramsky [1] and Bellin and Scott [5] and from the 
Curry-Howard correspondence for session types by Caires and Pfenning [8], but 
resembles the connection between a variant of local z-calculus and a polarised 
linear logic by Honda and Laurent [19]; a detailed analysis of the translation is 
left for future work. 

The logic corresponding to compact closed Freyd category should be a proper 
extension of linear logic, since compact closed Freyd categories form a proper 
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Table 1. The categorical and 7f-calculus interpretations of MELL formulas 


linear logic | compact closed Freyd category a r-calculus 
(formula) (object) (type environment) 
A@B A®B z:A,y:B 
ARB 
1A I>A x: ch°[A+] 
?A (A> I) x: ch'[A] 


subclass of linear-non-linear models. For example, the following rules are invalid 
in linear logic but admissible in compact closed Freyd categories: 
Fr FA HT,A,B H A, A+, B+ H- T, A, A+ 
FT, A FILA Fr 


These rules, especially the second rule called multicut, were often studied in 
concurrency theory; see Abramsky et al. [2] for their relevance to concurrency. 

Do the above rules fill the gap between linear logic and compact closed Freyd 
category? Recent work by Hasegawa [15] suggests that MELL with above rules 
is still weaker than compact closed Freyd category. First observe that the above 
rules can be interpreted in any linear-non-linear model of which the monoidal 
category is compact closed. Hasegawa showed that a linear-non-linear model 
whose monoidal category is compact closed induces a closed Freyd category of 
which the monoidal category is traced (and vice versa) but the induced Freyd 
category is not necessarily compact closed. Hence the logic corresponding to 
compact closed Freyd category has further axioms or rules in addition to the 
above ones. A reasonable candidate for the additional axiom is ! S ?; interest- 
ingly, Atkey et al. [3] reached a similar rule from a different perspective. Further 
investigation is left for future work. 


Non-empty Signature. The categorical type theory for the A-calculus con- 
siders a family parameterised by signatures, consisting of atomic types and con- 
stants. It covers, for example, the A-calculus with natural number type and 
arithmetic constants (such as addition and multiplication), as well as a calculus 
with integer reference type and read and update functions. 

Although this paper only considers the calculus with the empty signature, 
which has no additional type nor constant, extending our theory to handle non- 
empty signatures is, in a sense, not difficult. The easiest way is to apply the 
established theory of the computational \-calculus [33,37]. As we have seen in 
Sect.4, the mp-calculus can be seen as a computational A-calculus Acn hav- 
ing constants for manipulating channels; hence the 7p-calculus with additional 
constants is Aea with the additional constants, which is still in the family of 
computational A-calculus. 

The z-calculus with non-empty signature has several applications. We shall 
briefly discuss some of them. 
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An important example of tr with non-empty signature is the calculus with 
non-replicated input, which we regard as a calculus with additional “process 
constants” but without any additional type. A key observation is that every 
non-replicated input process a(#).P can be expressed as 


a(Z).P Z° (vbb)(a(Z).b(Z) | !b(Z).P) (© is weak barbed congruence) 


and thus it suffices to deal with non-replicated input processes in special form, 
namely a: ch'[T], b : ch°[T] F a(Z).b(z) : o. Adding these processes as con- 


stants and the computational rules of a(Z).b(Z) as equational axioms results in a 
calculus with non-replicated inputs. The categorical model is a compact closed 
Freyd category with distinguished morphisms (A => I) —> (A => I) for each 
object A which satisfy certain axioms. 


This technique is applicable to synchronous output as well. Because 
a(Z).P &° (vbb)(a(Z).b() | !b().P), 


it suffices to consider constants representing @: ch°[T], z: T, b: ch?[] F a(Z).b() 0. 


6 Related Work 


Logical Studies of m-calculi. There is a considerable amount of studies on 
connections between process calculi and linear logic. Here we divide these stud- 
ies into two classes. These classes are substantially different; for example, one 
regards the formula A & B as a type for processes with two “ports” of type A 
and B, whereas the other as the session-type !A.B. Our work is more closely 
related to the former than the latter, but some interesting coincidence to the 
latter kind of studies can also be found. 

The former class of research dates back to the work by Abramsky [1] and 
Bellin and Scott [5], where they discovered that 7-calculus processes can encode 
proof-nets of classical linear logic. Later, Abramsky et al. [2] introduced the 
interaction categories to give a semantic description of a CCS-like process calcu- 
lus. In their work, they observed that the compact closed structure is important 
to capture the strong expressive power of process calculi. 

A tighter connection between z-calculus and proof-nets was recently pre- 
sented by Honda and Laurent [19]. They showed that an i/o-typed z-calculus 
corresponds to polarised proof-nets, and introduced the notion of extended reduc- 
tion for the z-calculus to simulate cut-elimination. The z-calculus used in this 
work is very similar to mp in terms of syntax and reduction. Their calculus is 
asynchronous, does not allow non-replicated inputs, and requires i/o-separation. 
Furthermore, the extended reduction is almost the same as the rules (E-BETA) 
and (E-GC) except for the side conditions. A significant difference compared 
to our work is that their calculus is local [28,49], reflecting the fact that the 
corresponding logic is polarised. 

Our work is inspired by these studies. The idea of i/o-separation can already 
be found in the work by Bellin and Scott and the use of compact closed category 
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is motivated by the study of interaction category. It is worth mentioning here 
that the design of 7mp is also influenced by the calculus introduced by Laird [22], 
although it is not a logical study but categorical (see below). 

The latter approach started with the Curry-Howard correspondences between 
session-typed m-calculi and linear logic established by Caires, Pfenning and Ton- 
inho [8,9] and subsequently by Wadler [48]. These correspondences are exact 
in the sense that every process has a corresponding proof, and vice versa. As a 
consequence, processes of the calculi inherit good properties of linear logic proofs 
such as termination and confluence of cut-elimination. In terms of process cal- 
culi, process of these calculi do not fall into deadlock or race condition. This can 
be seen as a serious restriction of expressive power [3, 26,48]. 

Several extensions to increase the expressiveness of these calculi have been 
proposed and studied. Interestingly, ideas behind some of these extensions are 
related to our work, in particular to Sect.5 discussing the multicut rule [2] and 
the axiom ! ~ ?. Atkey et al. [3] studied CP [48] with the multicut rule and! S ? 
and discussed how these extensions increase the expressiveness of the calculus, 
at the cost of losing some good properties of CP. Dardha and Gay [10] studied 
another extension of CP with multicut, keeping the calculus deadlock-free by an 
elaborated type system. 

Balzer and Pfenning [4] proposed a session-typed calculus with shared (muta- 
ble) resources, inspired by linear-non-linear adjunction [6]. 


Categorical Semantics of 7-calculi. The idea of using a closed Freyd cate- 
gory to model the z-calculus is strongly inspired by Laird [22]. He introduced 
the distributive-closed Freyd category to describe abstract properties of a game- 
semantic model of the asynchronous z-calculus and showed that distributive- 
closed Freyd categories with some additional structures suffice to interpret the 
asynchronous 7-calculus. The additional structures are specific to his game model 
and not completely axiomatised.’ Our notion of compact closed Freyd category 
might be seen as a reformulation of his idea, obtained by filtering out some struc- 
tures difficult to axiomatise and by strengthening some others to make axioms 
simpler. A significant difference is that our categorical model does not deal with 
non-replicated inputs, which we think is essential for a simple axiomatisation. 
Another approach for categorical semantics of the z-calculus has been the 
presheaf based approach [12,44]. These studies gave particular categories that 
nicely handles the nominal aspects of the z-calculus; these studies, however, do 
not aim for a correspondence between a categorical structure and the 7-calculus. 


Higher-Order Calculi with Channels. Besides the \,;,-calculus, there are 
numbers of functional languages augmented by communication channels, from 
theoretical ones [13,25,46,48] to practical languages [34,38]. 

On the practical side, Concurrent ML (CML) [38], among others, is a well- 
developed higher-order concurrent language. CML has primitives to create chan- 
nels and threads, and primitives to send and accept values through channels. 


T A list of properties in [22] does not seem to be complete. We could not prove some 
claims in the paper only from these properties, but with ones specific to his model. 
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Since our Aca-calculus can create (non-linear) channels and send values via chan- 
nels, the A,,-calculus can be seen as a core calculus of CML despite its origin in 
categorical semantics. The major difference between CML and the \,,-calculus 
is that communications in CML are synchronous whereas communications in the 
Acn-calculus are asynchronous. 

On the theoretical side, session-typed functional languages have been actively 
studied [13, 25,46, 48]. Notably, some of these languages [25,46,48] are built upon 
the Curry-Howard foundation between linear logic and session-typed processes. 
It might be interesting to investigate whether we can relate these languages and 
the A-p-calculus through the lens of Curry-Howard-Lambek correspondence. 


Higher-Order vs. First-Order z-calculus. A number of translations from 
higher-order languages to the z-calculus have been developed [39, 40, 42,45, 47] 
since Milner [29] presented the encodings of the A-calculus into the z-calculus. 
The basic idea shared by these studies is to transform Axr.M to a process 
la(x,p).P that receives the argument x together with a name p where the rest 
of the computation will be transmitted. In our framework, this idea is described 
as the isomorphism A > B2YA®@B* > I. 

Among others, the translation from AHOz to Lr [42] is the closest to our 
translation from the A,;-calculus to the mp-calculus. Sangiorgi [41] observed 
that Milner’s translation can be established via the translation of AHOz by 
applying the CPS transformation to the A-calculus. This observation also applies 
to our translation. That is, we can obtain Milner’s translation by combining CPS 
transformation and the compilation of the A,;,-calculus. 


7 Conclusion and Future Work 


We have introduced an i/o-typed z-calculus (a -calculus) as well as the categor- 
ical counterpart of 7r-calculus (compact closed Freyd category) and showed the 
categorical type theory correspondence between them. The correspondence was 
established by regarding the -calculus as a higher-order programming language, 
introducing the i/o-separation, and introducing the 7-rule, a rule that explains 
the mismatch between behavioural equivalences and categorical models. 

As an application of our semantic framework we introduced a higher-order 
calculus Acp-calculus “equivalent” to the mp-calculus. We have demonstrated 
that translations between \,;,-calculus and 7 p-calculus can be derived by a sim- 
ple semantic argument, and showed that the translation from Acn to Tp is a 
generalisation of the translation from AHOz to Lr given by Sangiorgi [42]. 

There are three main directions for future work. First, further investiga- 
tion on the 7-rule is indispensable. We plan to construct a categorical model of 
the mp-calculus with an additional constant that captures barbed congruence. 
Revealing the relationship between locality and the 7-rule is another impor- 
tant problem. Second, the operational properties of the ,;,-calculus and its 
relation to the equational theory needs a further investigation. Third, finding 
the logical counterpart of compact closed Freyd category to establish a proper 
Curry-Howard-Lambek correspondence is an interesting future work. 
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Abstract. We propose a process algebra for link layer protocols, fea- 
turing a unique mechanism for modelling frame collisions. We also for- 
malise suitable liveness properties for link layer protocols specified in this 
framework. To show applicability we model and analyse two versions of 
the Carrier-Sense Multiple Access with Collision Avoidance (CSMA/CA) 
protocol. Our analysis confirms the hidden station problem for the ver- 
sion without virtual carrier sensing. However, we show that the version 
with virtual carrier sensing not only overcomes this problem, but also 
the exposed station problem with probability 1. Yet the protocol cannot 
guarantee packet delivery, not even with probability 1. 


1 Introduction 


The (data) link layer is the 2nd layer of the ISO/OSI model of computer network- 
ing [18]. Amongst others, it is responsible for the transfer of data between adja- 
cent nodes in Wide Area Networks (WANs) and Local Area Networks (LANs). 

Examples of link layer protocols are Ethernet for LANs [16], the Point-to- 
Point Protocol [24] and the High-Level Data Link Control protocol (e.g. [14]). 
Part of this layer are also multiple access protocols such as the Carrier-Sense Mul- 
tiple Access with Collision Detection (CSMA/CD) protocol for re-transmission 
in Ethernet bus networks and hub networks, or the Carrier-Sense Multiple Access 
with Collision Avoidance (CSMA/CA) protocol [17,19] in wireless networks. 

One of the unique characteristics of the link layer is that when devices 
attempt to use a medium simultaneously, collisions of messages occur. So, any 
modelling language and formal analysis of layer-2 protocols has to support such 
collisions. Moreover, some protocols are of probabilistic nature: CSMA/CA for 
example chooses time slots probabilistically with discrete uniform distribution. 

As we are not aware of any formal framework with primitives for mod- 
elling data collisions, this paper introduces a process algebra for modelling and 
analysing link layer protocols. In Sect. 2 we present an algebra featuring a unique 
mechanism for modelling collisions, ‘hard-wired’ in the semantics. It is the non- 
probabilistic fragment of the Algebra for Link Layer protocols (ALL), which we 
© The Author(s) 2019 


L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 668-693, 2019. 
https: //doi.org/10.1007/978-3-030-17184-1_24 


A Process Algebra for Link Layer Protocols 669 


introduce in Sect. 3. In Sect.4 we formulate packet delivery, a liveness property 
that ideally ought to hold for link layer protocols, either outright, or with a high 
probability. In Sect.5 we use this framework to formally model and analyse the 
CSMA/CA protocol. 

Our analysis confirms the hidden station problem for the version of 
CSMA/CA without virtual carrier sensing (Sect. 5.2). However, we also show 
that the version with virtual carrier sensing overcomes not only this problem, 
but also the exposed station problem with probability 1. Yet the protocol cannot 
guarantee packet delivery, not even with probability 1. 


2 A Non-probabilistic Subalgebra 


In this section we propose a timed process algebra that can model the collision 
of link layer messages, called frames.‘ It can be used for link layer protocols 
that do not feature probabilistic choice, and is inspired by the (Timed) Alge- 
bra for Wireless Networks ((T-) AWN) [2,12,13], a process algebra suitable for 
modelling and analysing protocols on layers 3 (network) and 4 (transport) of the 
OSI model. 

The process algebra models a (wired or wireless) network as an encapsulated 
parallel composition of network nodes. Due to the nature of the protocols under 
consideration, on each node exactly one sequential process is running. The alge- 
bra features a discrete model of time, where each sequential process maintains 
a local variable now holding its local clock value—an integer. We employ only 
one clock for each sequential process. All sequential processes in a network syn- 
chronise in taking time steps, and at each time step all local clocks advance by 
one unit. Since this means that all clocks are in sync and do not run at different 
speeds it is clear that we do not consider the problem of clock shift. For the rest, 
the variable now behaves like any other variable maintained by a process: its value 
can be read when evaluating guards, thereby making progress time-dependant, 
and any value can be assigned to it, thereby resetting the local clock. Network 
nodes communicate with their direct neighbours—those nodes that are in trans- 
mission range. The algebra provides a mobility option that allows nodes to move 
in or out of transmission range. The encapsulation of the entire network inhibits 
communications between network nodes and the outside world, with the excep- 
tion of the receipt and delivery of data packets from or to clients (the higher 
OSI layers). 


2.1 A Language for Sequential Processes 


The internal state of a process is determined, in part, by the values of certain 
data variables that are maintained by that process. To this end, we assume a 
data structure with several types, variables ranging over these types, operators 
and predicates. Predicate logic yields terms (or data expressions) and formulas 


1 As it is the nonprobabilistic fragment of a forthcoming algebra we do not name it. 
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to denote data values and statements about them. Our data structure always 
contains the types TIME, DATA, MSG, CHUNK, ID and Z (ID) of discrete time values, 
which we take to be integers, network layer data, messages, chunks of messages 
that take one time unit to transmit, node identifiers and sets of node identifiers. 
We further assume that there are variables now of type TIME and rfr of type 
CHUNK. In addition, we assume a set of process names. Each process name X 
comes with a defining equation 


X(varı,..., varn) = P, 


in which n € IN, var; are variables and P is a sequential process expression 
defined by the grammar below. It may contain the variables var; as well as 
X. However, all occurrences of data variables in P have to be bound.? The 
choice of the underlying data structure and the process names with their defining 
equations can be tailored to any particular application of our language. 

The sequential process expressions are given by the following grammar: 


P ::= X(exp,,...,exp,) | [y]P | [var := exp]P | a.P | P+P 


a ::= transmit(ms) | newpkt(data,dest) | deliver(data) 


Here X is a process name, exp; a data expression of the same type as var;, y 
a data formula, var := exp an assignment of a data expression exp to a variable 
var of the same type, ms a data expression of type MSG, and data, dest data 
variables of types DATA, ID respectively. 

Given a valuation of the data variables by concrete data values, the sequential 
process [y]P acts as P if p evaluates to true, and deadlocks if y evaluates to 
false. In case y contains free variables that are not yet interpreted as data 
values, values are assigned to these variables in any way that satisfies y, if 
possible. The process [var := exp] P acts as P, but under an updated valuation of 
the data variable var. The process P + Q may act either as P or as Q, depending 
on which of the two processes is able to act at all. In a context where both are able 
to act, it is not specified how the choice is made. The process a.P first performs 
the action œ and subsequently acts as P. The above behaviour is identical to 
AWN, and many other standard process algebras. The action transmit(ms) 
transmits (the data value bound to the expression) ms to all other network 
nodes within transmission range. The action newpkt(data,dest) models the 
injection by the network layer of a data packet data to be transmitted to a 
destination dest. Technically, data and dest are variables that will be bound to 
the obtained values upon receipt of a newpkt. Data is delivered to the network 
layer by deliver(data). In contrast to AWN, we do not have a primitive for 


2 An occurrence of a data variable in P is bound if it is one of the variables var;, one 
of the two special variables now or rfr, a variable var occurring in a subexpression 
[var := exp] Q, an occurrence in a subexpression [y]Q of a variable occurring free in 
y, or a variable data or dest occurring in a subexpression newpkt(data, dest).Q. 
Here Q is an arbitrary sequential process expression. 
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receiving messages from neighbouring nodes, because our processes are always 
listening to neighbouring nodes, in parallel with anything else they do. 

As in AWN, the internal state of a sequential process described by an expres- 
sion P is determined by P, together with a valuation € associating values €(var) 
to variables var maintained by this process. Valuations naturally extend to £- 
closed expressions—those in which all variables are either bound or in the domain 
of €. We denote the valuation that assigns the value v to the variable var, and 
agrees with € on all other variables, by ¿[var := v]. The valuation és agrees 
with € on all variables var € S and is undefined otherwise. Moreover we use 
¿[var +] as an abbreviation for ¿[var := €(var)+ 1], for suitable types. 

To capture the durational nature of transmitting a message between network 
nodes, we model a message as a sequence of chunks, each of which takes one 
time unit to transmit. The function dur : MSG — TIME, calculates the amount 
of time steps needed for a sending a message, i.e. it calculates the number of 
chunks. We employ the internal data type CHUNK := {m:c | m E MSG,1 < c < 
dur(m)} U {conflict, idle}. The chunk m:c indicates the cth fragment of a 
message m. Data conflicts—junk transmitted via the medium—is modelled by 
the special chunk conflict, and the absence of an incoming chunk is modelled 
by idle. 

Our process algebra maintains a variable rfr of type CHUNK, storing the frag- 
ment of the current message received so far. 


As a value of this variable, m:c indicates that the rfr ch rfrxch 
first c chunks of message m have been received in x |conflict] conflict 
order; conflict indicates that the last incoming x idle idle 
chunk was not the expected (next) part of a mes- x mil ml 
sage in progress, and idle indicates that the chan- m:c| m:c+1 m:c+1 
nel was idle during the last time step. The tableon rfr| m:c+1 | conflict 
the right, with x a wild card, shows how the value if rfr A m:c 


of rfr evolves upon receiving a new chunk ch. 

Specifications may refer to the data type 
CHUNK only through the Boolean functions NEW—having a single argument msg 
of type MSG—and IDLE, defined by NEW(msg) := (rfr = (msg : dur(msg)) and 
IDLE := (rfr = idle). A guard [NEW(msq)] evaluates to true iff a new message 
msg has just been received; [IDLE] evaluates to true iff in the last time slice the 
medium was idle. 

The structural operational semantics of Table 1 describes how one internal 
state can evolve into another by performing an action. The set Act of actions con- 
sists of transmit(m:c, ch), wait(ch), newpkt(d, dest), deliver(d), and internal 
actions 7, for each choice of m E€ MSG, cE{1,...,dur(m)}, ch € CHUNK, de DATA 
and dest€ ID, where the first two actions are time consuming. On every time- 
consuming action, each process receives a chunk ch and updates the variable rfr 
accordingly; moreover, the variable now is incremented on all process expressions 
in a (complete) network synchronously. 

Besides the special variables now and rfr, the formal semantics employs an 
internal variable cntr € IN that enumerates the chunks of split messages and is 
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used to identify which chunk needs to be sent next. The variables now, rfr and 
cntr are not meant to be changed by ALL specifications, e.g. by using assign- 
ments. We call them read-only and collect them in the set RO = {now, rfr, cntr}. 

Let us have a closer look at the rules of Table 1. 

The first two rules describe the sending of a message ms. Remember that 
dur(ms) calculates the time needed to send ms. The counter cntr keeps track 
of the time passed already. The action transmit(m:c, ch) occurs when the node 
transmits the fragment m:c; simultaneously, it receives the fragment ch.3 The 
counter cntr is 0 before a message is sent, and is incremented before the trans- 
mission of each chunk. So, each chunk sent has the form €(ms):€(cntr)+1. To 
ease readability we abbreviate €(cntr)+1 by c+. In case the (already incre- 
mented) counter c+ is strictly smaller than the number of chunks needed to send 
&(ms), another transmit-action is needed (Rule 1); if the last fragment has been 
sent (c+ = dur(&(ms))) the process can continue to act as P (Rule 2). 

The actions newpkt (d, dest) and deliver(d) are instantaneous and model the 
submission of data d from the network layer, destined for dest, and the delivery 
of data d to the network layer, respectively. The process newpkt(d, dest).P has 
also the possibility to wait, namely if no network layer instruction arrives. 

Rule 6 defines a rule for assignment in a straightforward fashion; only the 
valuation of the variable var is updated. 

In Rules 7 and 8, which define recursion, €jpg[var; := €(exp;)]7_, is the valu- 
ation that only assigns the values €(exp;) to the variables var;, for i=1,...,n, 
and maintains the values of the variables now, rfr and cntr. These rules state 
that a defined process X has the same transitions as the body p of its defining 
equation. In case of a wait-transition, the sequential process does not progress, 
and accordingly the recursion is not yet unfolded. 

Most transition rules so far feature statements of the form (exp) where exp 
is a data expression. The application of the rule depends on €(exp) being defined. 
Rule 9 covers all cases where the above rules cannot be applied since at least one 
data expression in an action a is not defined. A state £, P is unvalued, denoted 
by ¿(p)f{, if P has the form transmit(ms).P, deliver(data).P, [var := exp] P 
or X(exp,,...,exp,) with either €(ms) or (data) or €(exp) or some &(exp,) 
undefined. From such a state the process can merely wait. 

A process P + Q can wait only if both P and Q can do the same; if either 
P or Q can achieve ‘proper’ progress, the choice process P + Q always chooses 
progress over waiting. A simple induction shows that if €,P Svein) ¢, P’ and 
E, Q waited), ¢ Q! then P= P’,Q=Q! and C=C. 

The first rule of (12), describing the semantics of guards [vy], is taken from 
AWN. Here ¿£ + Ç says that Ç is an extension of £, i.e. a valuation that agrees 
with € on all variables on which € is defined, and evaluates other variables occur- 
ring free in y, such that the formula y holds under Ç. All variables not free in 
y and not evaluated by € are also not evaluated by Ç. Its negation € £4 says 


3 Normally, a node is in its own transmission range. In that case the received chunk 
ch will be either the chunk mic it is transmitting itself, or conflict in case some 
other node within transmission range is transmitting as well. 
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that no such extension exists, and thus, that ọ is false in the current state, no 
matter how we interpret the variables whose values are still undefined. If that is 
the case, the process [y]p will idle by performing the action wait(ch). 


2.2 A Language for Node Expressions 


We model network nodes in the context of a (wireless) network by node expres- 
sions of the form 
id:(€,P):R. 


Here id € ID is the address of the node, P is a sequential process expression 
with a valuation €, and R € A(ID) is the range of the node, defined as the set 
of nodes within transmission range of id. Unlike AWN, the process algebra does 
not offer a parallel operator for combining sequential processes; such an operator 
is not needed due to the nature of link layer protocols. 

In the semantics of this layer it is crucial to handle frame collisions. The idea 
is that all chunks sent are recorded, together with the respective recipient. In 
case a node receives more than one chunk at a time, a conflict is raised, as it 
is impossible to send two or more messages via the same medium at the same 
time. 

The formal semantics for node expressions, presented in Table 2, uses tran- 
sition labels traffic(T, R), id: deliver(d), id: newpkt(d, id’), connect(id, id’), 
disconnect (id, id’) and T, with partial functions T, R : ID — CHUNK, id, id’ € ID, 
and d € DATA. 


Table 2. Structural operational semantics for node expressions 


wait (idle) transmit(m:c,idle) 


pees} P Paes p’ 


id: P:R SESE id PR id: P:R FU m: ree iq: PLR 


P wait(ch), P transmit(m:c,ch), P' 
a ‘a 


= (chide) (chyéidle) 


id: P: R eee, HD, gg PER id Py eee ime) |rER) {Ud ch) iq: PIR 


P deliver(d), P' P newpkt(d, dest) P' P Z pl 


id: P:R id: deliver(d), id: PR id: P:R id: newpkt(d, dest) id: P:R id: P:R id: PR 


id: P:R comnect(idid), ig. P:RU{id'} id: P: R Sisconmectidid), ig. P:R — {id'} 
id: P: R connect’ id, id: P:RU{id'} idi P:R SSeerrectGd id, id: P:R — {id'} 
idg {id', id"} idg {id', id} 

id:P:R connect (id’,id’’) id: P:R id:P:R disconnect (id’,id’’) id:P:R 


All time-consuming actions on process level (transmit(m:c,ch) and wait (ch)) 
are transformed into an action traffic(T, R) on node level: the first argument 
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Table 3. Structural operational semantics for network expressions 


M = M' NS N’ M 5 M' ( ei id: deliver(d), \) 
MIN M'IN MIN > MIN’ [M] [M seen 
MSM NSN M 5 M' (v Sees id’), \) 
MIN ER M'IN’ [M] EN [M] disconnect(id, id’) 
M traffic(71,R1) M' N traffic(72,R2), N’ M traffic(R,R), M' 
M||N En © Te,Ra YR} agri yyy [m] #5 [m] 


T maps dest to m:c if and only if the chunk m:c is transmitted to dest. The 
second argument R maps id to m:c if and only if the chunk m:c is received on 
process level at node id. For the sos-rules of Table2 we use the set-theoretic 
presentation of partial functions. The two rules for wait set 7 := É, as no 
chunks are transmitted; the rules for transmit allow a transmitted chunk m:c 
to travel to all nodes within transmission range: T := {(r,m:c)|r € R}. In case 
that during the transmission or waiting no chunk is received (ch = idle) we set 
R = 0; otherwise R = {(id, ch)}, indicating that chunk ch is received by node id. 

The actions id:newpkt(d, dest) and id: deliver(d) as well as the internal 
actions 7 are simply inherited by node expressions from the processes that run 
on these nodes. 

The remaining rules of Table 2 model the mobility aspect of wireless networks; 
the rules are taken straight from AWN [12, 13]. We allow actions connect (id, id’) 
and disconnect/(id, id’) for id, id’ € ID modelling a change in network topology. 
These actions can be thought of as occurring nondeterministically, or as actions 
instigated by the environment of the modelled network protocol. In this formali- 
sation node id’ is in the range of node id, meaning that id’ can receive messages 
sent by id, if and only if id is in the range of id’. To break this symmetry, one just 
skips the last four rules of Table 2 and replaces the synchronisation rules for con- 
nect and disconnect in Table 3 by interleaving rules (like the ones for deliver, 
newpkt and 7) [12]. For some applications a wired or non-mobile network need 
to be considered. In such cases the last six rules of Table2 are dropped. 

Whether a node id: P: R receives its own transmissions depends on whether 
id € R. Only if id € R our process algebra will disallow the transmission from 
and to a single node id at the same time, yielding a conflict. 


2.3 A Language for Networks 


A partial network is modelled by a parallel composition || of node expressions, 
one for every node in the network. A complete network is a partial network 
within an encapsulation operator |], which limits the communication between 
network nodes and the outside world to the receipt and delivery of data packets 
to and from the network layer. 
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The syntax of networks is described by the following grammar: 
N:=[Mp] Mius,“ =M5,||M3, Mig “=id: (€, P):R , 


with {id} U R C T C ID. Here MZ models a partial network describing the 
behaviour of all nodes id € S. The set T contains the identifiers of all nodes that 
are part of the complete network. This grammar guarantees that node identifiers 
of node expressions—the first component of id: P: R—are unique. 

The operational semantics of network expressions is given in Table 3. Internal 
actions 7 as well as the actions id:deliver(d) and id: newpkt(d,id) are inter- 
leaved in the parallel composition of nodes that makes up a network, and then 
lifted to encapsulated networks (Line 1 of Table 3). 

Actions traffic and (dis)connect are synchronised. The rule for synchro- 
nising the action traffic (Line 3), the only action that consumes time on the 
network layer, uses the union W of partial functions. It is formally defined as 


conflict if id € dom(Rı) N dom(R2) 
(Ri Y Ra) (id) := 4 Rı(id) if id € dom(R1) — dom(R2) 
R2(id) if id € dom(R2) — dom(R1) . 


The synchronisation of the sets R; and T; has the following intuition: if a node 
identifier id € ID is in both dom(7;) and dom(72) then there exist two nodes that 
transmit to node id at the same time, and therefore a frame collision occurs. 
In our algebra this is modelled by the special chunk conflict. The sos rules of 
Tables 2 and 3 guarantee that there cannot be collisions within the set of received 
chunks R. The reason is that each node merely contributes to R a chunk for 
itself; it can be the chunk conflict though. Therefore we could have written 
Rı U Re instead of Rı W Rə in the sixth rule of Table 3. 

The last rule propagates a traffic(T, )-action of a partial network M to a 
complete network [M]. By then T consists of all chunks (after collision detection) 
that are being transmitted by any member in the network, and R consists of all 
chunks that are received. The condition R = 7 determines the content of the 
messages in R. The traffic(T, R)-actions become internal at this level, as they 
cannot be steered by the outside world; all that is left is a time-step tick. 


2.4 Results on the Process Algebra 


As for the process algebra T-AWN [2], but with a slightly simplified proof, one 
can show that our processes have no time deadlocks: 


Theorem 2.1. A complete network N in our process algebra always admits a 
transition, independently of the outside environment, i.e. VN, Ja such that N > 
and a € {connect id, id’), disconnect(id, id’), id:newpkt(d, dest)}. 


l ; tick id: deliver (d 
More precisely, either N “SS, or N + eliver(d), or N 5, 


The following results (statements and proofs) are very similar to the results 
about the process algebra AWN, as presented in [13]. A rich body of foundational 
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meta theory of process algebra allows the transfer of the results to our setting, 
without too much overhead work. 

Identical to AWN and its timed version T-AWN, our process algebra admits 
a translation into one without data structures (although we cannot describe the 
target algebra without using data structures). The idea is to replace any variable 
by all possible values it can take. The target algebra differs from the original only 
on the level of sequential processes; the subsequent layers are unchanged. The 
construction closely follows the one given in the appendix of [2]. The inductive 
definition contains the rules 

Ie (deliver(data).P) = deliver(é(data)).%(P) and 

Fe([var — exp] P) = T. Te [var := €(exp)| (P). 
Most other rules require extra operators that keep track of the passage of time 
and the evolution of other internal variables. The resulting process algebra has a 
structural operational semantics in the (infinitary) de Simone format, generating 
the same transition system—up to strong bisimilarity, <2 —as the original. It 
follows that <, and many other semantic equivalences, are congruences on our 
language [23]. 


Theorem 2.2. Strong bisimilarity is a congruence for all operators of our lan- 
guage. 


This is a deep result that usually takes many pages to establish (e.g. [25]). Here 
we get it directly from the existing theory on structural operational semantics, 
as a result of carefully designing our language within the disciplined framework 
described by de Simone [23]. 


Theorem 2.3. The operator || is associative and commutative, up to ©. 


Proof. The operational rules for this operator fits a format presented in [6], 
guaranteeing associativity up to ©. The ASSOC-de Simone format of [6] 
applies to all transition system specifications (TSSs) in de Simone format, 
and allows 7 different types of rules (named 1-7) for the operators in ques- 
tion. Our TSS is in de Simone format; the four rules for || of Table3 are 
of types 1, 2 and 7, respectively. To be precise, it has rules 1, and 2, for 
a € {7,id: deliver(d),id: newpkt(d, dest)}, rules 7(q,p) for 


(a,b) € {(traffic(T], R1), traffic(Zo, R2)) | Ri, Re, Ti, T2 € ID — CHUNK} 


and rules 7(.,.) for c € {connect (id, id’), disconnect(id, id’) | id, id’ € ID}. 
Moreover, the partial communication function y: Act x Act — Act is given by 
y(traffic(T1, R1), traffic(T2, R2)) = traffic(T; Y T2,RiW R2) and y(c,c) = c. 
The main result of [6] is that an operator is guaranteed to be associative, provided 
that y is associative and six conditions are fulfilled. In the absence of rules 
of types 3, 4, 5 and 6, five of these conditions are trivially fulfilled, and the 
remaining one reduces to 


7 (a,b) => (la 25) A (2a  2y(a,8)) A (lo © Ly(a,2)) - 
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Here 1, says that rule 1, is present, etc. This condition is trivially met for || as 
there neither exists a rule of the form Itraffic(T,R) nor of the form 2traric(T,R); 
or 1e, 2, with c as above. As on traffic actions y is basically the union of partial 
functions (W), where a collision in domains is indicated by an error conflict, it 
is straightforward to prove associativity of y. 
Commutativity of || follows by symmetry of the sos rules. 


3 An Algebra for Link Layer Protocols 


We now introduce ALL, the Algebra for Link Layer protocols. It is obtained 
from the process algebra presented in the previous section by the addition of a 
probabilistic choice operator Bp. As a consequence, the semantics of the algebra 
is no longer a labelled transition system, but a probabilistic labelled transition 
system (pLTS) [8]. This is a triple (S, Act, +), where 


(i) S is a set of states 
(ii) Act is a set of actions 
(iii) => C S x Act x D(S), where D(S) is the set of all (discrete) probability 
distributions over S: functions A: S — [0,1] with X ses A(s) = 1. 


As with LTSs, we usually write s > A instead of (s,a, A) € —. The point 
distribution ðs, for s € S, is the distribution with 6,(s) = 1. We simply write 
s — t for s “> ô. An LTS may be viewed as a degenerate pLTS, in which only 
point distributions occur. For a uniform distribution over s9,...,5n E€ S we write 
Uj_9s;. The pLTS associated to ALL takes S' to be the disjoint union of the pairs 
€,P, with P a sequential process expression, and the network expressions. Act 
is the collection of transition labels, and — consists of the transitions derivable 
from the structural operational semantics of the language. 

Rules (1)—(6), (9), (11) and (12) of Table 1 are adopted to ALL unchanged, 
whereas in Rules (7), (8) and (10) the state Ç, P’ (or ¢, Q’) is replaced by an 
arbitrary distribution A. Add to those the following rule for the probabilistic 
choice operator: 


n 


EQ PS UE ea -=4q,P 
i=0 


Here the data variable i may occur in P. The rules of Tables 2 and 3 are adapted 
to ALL unchanged, except that P’, M’ and N’ are now replaced by arbitrary 
distributions over sequential processes and network expressions, respectively. 
Here we adapt the convention that a unary or binary operation on states lifts 
to distributions in the standard manner. For example, if A is a distribution over 
sequential processes, id € ID and R C ID, then id: A: R describes the distribution 
over node expressions that only has probability mass on nodes with address id 
and range R, and for which the probability of id: P: R is A(P). Likewise, if A and 
© are distributions over network expressions, then A||O is the distribution over 
network expressions of the form M||N, where (A||O)(M||N) = A(M)- O(N). 
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4 Formalising Liveness Properties of Link Layer 
Protocols 


Link layer protocols communicate with the network layer through the actions 
id: newpkt(d, dest) and id: deliver(d). The typical liveness property expected 
of a link layer protocol is that if the network layer at node id injects a data 
packet d for delivery at destination dest then this packet is delivered eventually. 
In terms of our process algebra, this says that every execution of the action 
id: newpkt(d, dest) ought to be followed by the action dest: deliver(d). This 
property can be formalised in Linear-time Temporal Logic [22] as 


G(id: newpkt(d, dest) = F (dest: deliver(d))) (1) 


for any id, dest € ID and d € DATA. This formula has the shape G (gPre > Frost), 
and is called an eventuality property in [22]. It says that whenever we reach a 
state in which the precondition $?”° is satisfied, this state will surely be followed 
by a state were the postcondition ¢”°%* holds. In [7,13] it is explained how action 
occurrences can be seen or encoded as state-based conditions. Here we will not 
define how to interpret general LTL-formula in pLTSs, but below we do this for 
eventuality properties with specific choices of øP"? and $?°*. 

Formula (1) is too strong and does not hold in general: in case the nodes 
id and dest are not within transmission range of each other, the delivery of 
messages from id to dest is doomed to fail. We need to postulate two side 
conditions to make this liveness property plausible. Firstly, when the request 
to deliver the message comes in, id needs to be connected to dest. We intro- 
duce the predicate cntd(id, dest) to express this, and hence take $?"* to be 
cntd(id, dest) \ id: newpkt(d, dest). Secondly, we assume that the link between 
id and dest does not break until the message is delivered. As remarked in [13], 
such a side condition can be formalised by taking ¢?°* to be dest: deliver(d) V 
disconnect(id, dest). Thus the liveness property we are after is 


G(cntd (id, dest) ^ id: newpkt(d, dest) > 2) 
F(dest: deliver(d) V disconnect(id, dest) V disconnect( dest, id))) 


We now define the validity of eventuality properties G (gPre => Fprost). Here 
oP"? and ¢?°s' denote sets of transitions and actions, respectively, and hold if one of 
the transitions or actions in the set occurs. In (2), 6? denotes the transitions with 
label id: newpkt(d, dest) that occur when the side condition cntd (id, dest) is met, 
whereas $?°%' = {dest: deliver(d), disconnect (id, dest), disconnect(dest, id) } 
is a set of actions. 

A path in a pLTS (S, Act, —) is an alternating sequence so, @1,$1,Q2,... of 
states and actions, starting with a state and either being infinite or ending with 
a state, such that there is a transition s; “++ A;,, with Aj.1(s;41) > 0 for each 
i. The path is rooted if it starts with a state marked as ‘initial’, and complete if 
either it is infinite, or there is no transition starting from its last state. A state 
or transition is reachable if it occurs in a rooted path. 
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In a pLTS with an initial state, an eventually formula G(dPre => Forst), 
with $?" and $?°* denoting sets of transitions and actions, holds outright if all 
complete paths starting with a reachable transition from ¢?"* contain a transition 
with a label from ¢?°*. 

Definitions 3 and 5 in [9] define the set of probabilities that a pLTS with 
an initial state will ever execute the action w. One obtains a set of probabilities 
rather than a single probability due to the possibility of nondeterministic choice. 
This definition generalises to sets of actions ¢?°%' (seen as disjunctions) by first 
renaming all actions in such a set into w. It also generalises trivially to pLTSs 
with an initial transition. For t a transition in a pLTS, let Prob(t, 6?°%*) be the 
infimum of the set of probabilities that the pLTS in which t is taken to be the 
initial transition will ever execute ¢?°%'. Now in a pLTS with an initial state, an 
eventually formula G (grr => Forest) holds with probability at least p if for all 
reachable transitions t in ¢?’ we have Prob(t, ¢?°%') > p. 

Possible correctness criteria for link layer protocols are that the liveness prop- 
erty (2) either holds outright, holds with probability 1, or at least holds with 
probability p for a sufficiently high value of p. 

Sometimes we are content to establish that (2) holds under the additional 
assumptions that the network is stable until our packet is delivered, meaning that 
no links between any nodes are broken or established, and/or that the network 
layer refrains from injecting more packets. This is modelled by taking 


gr?" = { dest: deliver(d), disconnect(«, x), connect(*, x), newpkt(x, *)}. (3) 


We will refer to this version of (2) as the weak packet delivery property. Packet 
delivery is the strengthening without newpkt(x, x) in (3), ie. not assuming that 
the network layer refrains from injecting more packets. 


5 Modelling and Analysing the CSMA/CA Protocol 


In this section we model two versions of the CSMA/CA protocol, using the 
process algebra ALL. Moreover, we briefly discuss some results we obtained 
while analysing these protocols. 

The Carrier-Sense Multiple Access (CSMA) protocol is a media access con- 
trol (MAC) protocol in which a node verifies the absence of other traffic before 
transmitting on a shared transmission medium. If a carrier is sensed, the node 
waits for the transmission in progress to end before initiating its own transmis- 
sion. Using CSMA, multiple nodes may, in turn, send and receive on the same 
medium. Transmissions by one node are generally received by all other nodes 
connected to the medium. 

The CSMA protocol with Collision Avoidance (CSMA/CA) [17,19]* 
improves the performance of CSMA. If the transmission medium is sensed busy 


4 The primary medium access control (MAC) technique of IEEE 802.11 [19] is called 
distributed coordination function (DCF), which is a CSMA/CA protocol. 
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before transmission then the transmission is deferred for a random time interval. 
This interval reduces the likelihood that two or more nodes waiting to transmit 
will simultaneously begin transmission upon termination of the detected trans- 
mission. CSMA/CA is used, for example, in Wi-Fi. 

It is well known that CSMA/CA suffers from the hidden station problem (see 
Sect. 5.2). To overcome this problem, CSMA/CA is often supplemented by the 
request-to-send/clear-to-send (RTS/CTS) handshaking [19]. This mechanism is 
known as the IEEE 802.11 RTS/CTS exchange, or virtual carrier sensing. While 
this extension reduces the amount of collisions, wireless 802.11 implementations 
do not typically implement RTS/CTS for all transmissions because the trans- 
mission overhead is too great for small data transfers. 

We use the process algebra ALL to model both the CSMA/CA without and 


with virtual carrier sensing. 


5.1 A Formal Model for CSMA/CA 


Our formal specification of CSMA/CA consists of four short processes written in 
ALL. It is precise and free of ambiguities—one of the many advantages formal 
methods provide, in contrast to specifications written in English prose. 

The syntax of ALL is intended to look like pseudo code, and it is our belief 
that the specification can easily be read and understood by software engineers, 
who may or may not have experience with process algebra. 

As the underlying data structure of our model is straightforward, we do not 
present it explicitly, but introduce it while describing the different processes. 

The basic process CSMA, depicted in Process 1, is the protocol’s entry point. 


Process 1. The Basic Routine 


csma(ia) “ 


1. newpkt(data,dest). INIT(id,0,dataframe(data,id,dest)) 
+ [NEW(dataframe(data,src,id))] deliver(data) . 
( 
[timeout := now+ sifs] [now > timeout] 
transmit(ackframe(src)) . CSMA(id) 


) 


ao a 8 ww 


This process maintains a single data variable id in which it stores its own iden- 
tity. It waits until either it receives a request from the network layer to transmit a 
packet data to destination dest, or it receives from another node in the network 
a CSMA message (data frame) destined for itself. 

In case of a newly injected data packet (Line 1), the process INIT is called; this 
process (described below) initiates the sending of the message via the medium. 
When passing the message on to INIT we use a function dataframe : DATA x ID x 
ID — MSG that generates a message in a format used by the protocol: next to 
the header fields (from which we abstract) it contains the injected data as well 
as the designated receiver dest and the sender id—the current node. 
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In case of an incoming dataframe destined for this node (the third argument 
carrying the destination is id) (Line 2)—any other incoming message is ignored 
by this process—the data is handed over to the network layer (deliver(data)) 
followed by the transmission of an acknowledgement back to the sender of the 
message (src). CSMA/CA requires a short period of idling medium before send- 
ing the acknowledgement: in [19] this interval is called short interframe space 
(sifs). The process waits until the time of the interframe spacing has passed, and 
then transmits the acknowledgement. The acknowledgement sent is not always 
received by src, e.g. due to data collision; therefore src could send the same 
message again (see Process 4) and id could deliver the same data to the network 
layer again. 


Process 2. Protocol Initialisation 


INIT(id,tries,dframe) “! 


1. [tries < max_retransmit] 


2 [cw := cwmin x 2] 
3. fy’ CCA(id,b,tries,dframe) /* choose a backoff from {0,...,cw—1} */ 
a. + [tries > max_retransmit 
5 


deliver(channel_access_failure) . CSMA(id) 


The process INIT (Process 2) initiates the sending of a message via the 
medium. Next to the variable id, which is maintained by all processes, it main- 
tains the variable tries and dframe: tries stores the number of attempts 
already made to send message dframe. When the process is called the first time 
for a message dframe (Line 1 of Process 1) the value of tries is 0. 

The constant max_retransmit specifies the maximum number of attempts 
the protocol is allowed to retransmit the same message. If the limit is not yet 
reached (Line 1) the message dframe is sent. As mentioned above, CSMA/CA 
defers messages for a random time interval to avoid collision. The node must start 
transmission within the contention window cw, a.k.a. backoff time. cw is calcu- 
lated in Line 2; it increases exponentially.” After cw is determined, the process 
CCA is called, which performs the actual transmit-action. In case the maximum 
number of retransmits is reached (Line 4), the process notifies the network layer 
and restarts the protocol, awaiting new instructions from the application layer, 
or a new incoming message. 

Process 3 takes care of the actual transmission of dframe. However, the 
protocol has a complicated procedure when to send this message. 

First, the process senses the medium and awaits the point in time when it is 
idle (Line 6). In case, before this happens, it receives from another node in the 
network a CSMA message destined for itself (Line 1), this message is handled 
just as in Process 1, except that after acknowledging this message the protocol 
returns to Process 3. 


5 A typical value for cwmin is 16; it must satisfy cwmin > 0. 
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Process 3. Clear Channel Assessment With Physical Carrier Sense 


CCA(id,b,tries,dframe) “! 


1. [NEW(dataframe(data,src,id))] deliver(data) . 


2. 

3. [timeout := now + sifs] [now > timeout] 

4. transmit(ackframe(src)) . CCA(id,b,tries,dframe) 
5.) 

6. + [IDLE] 

7.  [[timeout:=now+difs] /* start wait for duration difs */ 
s ( 

9. [AIDLE] CCA(id,b,tries,dframe) 

10. + [IDLE A now > timeout] 

11. [timeout := now + b] 

12. ( 

13. [AIDLE] /* busy during backoff time */ 

14. [b := timeout — now] CCA(id,b,tries,dframe) 

15. + [IDLE A now > timeout] /* idle for backoff time */ 
16. transmit(dframe) . 

17. ACKRECV(id,tries,now+max_ack_wait,dframe) 

18 ) 

19.) 


To guarantee a gap between messages sent via the medium, CSMA/CA (as 
well as other protocols) specifies the distributed (coordination function) inter- 
frame space (difs € TIME), which is usually small,° but larger than sifs, so 
that acknowledgements get priority over new data frames. When the medium 
becomes busy during the interframe space, another node started transmitting 
and the process goes back to listening to the medium (Line 9). In case nothing 
happens on the medium and the end of the interframe space is reached (Line 
10), the process determines the actual time to start transmitting the message, 
taking the backoff time b into account (Line 11). If the medium is idle for the 
entire backoff period (Line 15), the message is transmitted (Line 16), and the 
process calls the process ACKRECV that will await an acknowledgement from the 
recipient of dframe (Line 17); the third argument specifies the maximum time 
the process should wait for such an acknowledgement. (As mentioned before an 
acknowledgement may never arrive.) If another node transmits on the medium 
during the backoff period, the protocol restarts the routine (Lines 13 and 14), 
with an adjusted backoff value b—the process already started waiting and should 
not be punished when the waiting is restarted; this update guarantees fairness 
of the protocol. 

The process awaiting an acknowledgement (Process 4) is straightforward. It 
waits until either it receives a CSMA message destined for itself (Line 1), or it 
receives an acknowledgement (Line 6), or it has waited for this acknowledgement 
as long as it is going to (Line 8). 


ê Recommended values for the constant difs are given in [19]. 
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In the first case, the message is handled just as in Process 1, except that after 
acknowledging this message the protocol returns to Process 4. In the second case 
the network layer is informed that the sending of dframe was successful and the 
process loops back to Process 1 (Line 7). Line 8 describes the situation where no 
acknowledgement message arrives and the process times out. Here CSMA/CA 
retries to send the message; the counter tries is incremented. 


Process 4. Receiving an ACK 


ACKRECV(id,tries,acktimeout ,dframe) 2 


1. [NEW(dataframe(data,src,id))] deliver(data) . 


deliver(success) . CSMA(id) 
+ [now > acktimeout] INIT(id,tries+1,dframe) 


2. 

3. timeout := now + sifs]] [now > timeout] 

4. transmit(ackframe(src)) . ACKRECV(id,tries,acktimeout ,dframe) 
5.) 

6. + [NEW(ackframe(id))] /* acknowledgement received */ 

Te 

8. 


5.2 The Hidden Station Problem 


As mentioned in the introduction to this section, CSMA/CA suffers from the 
hidden station problem. This refers to the situation where two nodes A and C 
are not within transmission range of each other, while a node B is in range of 
both. In this situation C may be transmitting to B, but A is not able to sense 
this, and thus may start a transmission to B at roughly the same time, leading 
to data collisions at B. 

While CSMA/CA is not able to avoid such collisions as a whole—it is always 
possible that two (or more) nodes hidden from each other happen to (randomly) 
choose the same backoff time to send messages—it is the exponential growth of 
the backoff slots that makes the problem less pressing in the long run, as the 
following theorem shows. 


Theorem 5.1. If max_retransmit=oo then weak packet delivery holds with 
probability 1. 


Proof sketch. Since the number of messages that nodes transmit is bounded, and 
all nodes select random times to start transmitting out of an increasing longer 
time span, with probability 1 each message will eventually go through. 


In practice, max_retransmit is set to a value that is not high enough to approx- 
imate the idea behind the above proof. In fact, the transmission time of a single 
message may be larger than the maximal backoff period allowed. For this reason 
the hidden station problem does occur when running the CSMA/CA protocol, 
as studies have shown [5]. Nevertheless, the above analysis still shows that link 
layer protocols can be formally analysed by process algebra in general, and ALL 
in particular. 
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sender receiver 
RTS 
CTS 
Data 
ACK 


Fig. 1. RTS/CTS exchange 


5.3 A Formal Model for CSMA/CA with Virtual Carrier Sensing 


To overcome the hidden station problem the usage of a request-to-send/clear- 
to-send (RTS/CTS) handshaking [19] mechanism is available. This mechanism 
is also known as virtual carrier sensing. The exchange of RTS/CTS messages 
happens just before the actual data is sent, see Fig. 1. The mechanism serves two 
purposes: (a) As the RTS and CTS messages are very short—they only contain 
two node identifiers as well as a natural number indicating the time it will take to 
send the actual data (plus overhead)—the likelihood of a collision is reduced. (b) 
While the handshaking does not help with solving the hidden station problem 
for the RTS message itself, it avoids the problem for the sending of data. The 
reason is that a hidden node, which could interfere with the sending of data will 
receive the CTS message from the designated recipient of data, and the hidden 
node will remain silent until the data has been sent. 

As for the CSMA/CA protocol we have modelled this extension in ALL, 
based on the model of CSMA/CA we presented earlier. 

Our extended model uses two functions to generate rts and cts messages, 
respectively. The signature of both is ID x ID x TIME — MSG. The first argu- 
ment carries the sender (source) of the message, the second the indented des- 
tination, and the third argument a duration (time period) of silence that is 
requested/granted. For example, before the message rts(src,dest,d) is trans- 
mitted, the time period d is calculated by 
The calculation is straightforward as it follows the protocol logic and determines 
the amount of time needed until the acknowledgement would be received (see 
Fig. 2). After the rts message has been received the medium should be idle for 
the interframe space sifs; then a cts message is sent back, which takes time 
dur_cts; then another interframe space is needed, followed by the actual trans- 
mission of the message—the sending will take dur(dataframe(data,id,dest)) 
time units; after the message is received (hopefully) another interframe space is 
required before the acknowledgement is sent back. 


[d := sifs+dur_cts+sifs+dur(dataframe(data,id,dest))+sifs+dur_ack]. 


Process 2 remains essentially unchanged; it is merely equipped with the des- 
tination dest of the message that needs to be transmitted, and an additional 
timed variable nav € TIME. These variables are not used in this process, but 
required later on. Variable nav holds the point in time until the process should 
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Fig. 2. The use of virtual channel sensing using CSMA/CA [3] 


not transmit any rts or cts message. This period of silence is necessary as the 
node figures out that until time nav another node will transmit message(s).” 

Process 5 is the modified version of Process 1. Identical to Process 1 it awaits 
an instruction from the network layer, or an incoming CSMA message destined 
for itself. Lines 1-3 are identical to Process 1. Lines 4-11 handle the two new mes- 
sage types. In case an rts message rts(src,dest,d) is received that is intended 
for another recipient (dest # id) the node concludes that another node wants to 
use the medium for the amount of d time units; the process updates the variable 
nav if needed, indicating the period the node should remain silent, by taking 
the maximum of the current value of nav, and now+d, the point in time until 
the sender src of the rts message requires the medium. The same behaviour 
occurs if a cts message is received that is not intended for the node itself (Line 
4). If the incoming message is an rts message intended for the node itself (Line 
6) by default the node answers with a clear-to-send message back to the sender 
(Line 9). However, when the receiver of the rts has knowledge about other nodes 
requiring the medium (now < nav), a clear-to-send cannot be granted, and the 
request is dropped (Line 6). Similar to the sending of an acknowledgement (Line 
2), the process waits for the short interframe space (sifs) before sending the 
CTS (Line 6). Line 8 handles the case where the medium becomes busy (~IDLE) 
during this period; also here a clear-to-send cannot be granted, and the request 
is dropped. Only when the medium stays idle during the entire interframe space 
the node id can inform the source of the rts message that the medium is clear 
to send; the cts is transmitted in Line 9. The time a receiver of this message 
has to be silent is adjusted by deducting the time elapsed before this happens. 
In Line 10 the process resets nav to remind itself not to issue any rts message 
until the present exchange has been completed.’ 


T After a successful RTS/CTS exchange, communicating nodes proceed with trans- 
mitting the data and an acknowledgement regardless of the value of nav. 

8 The condition now > timeout—sifs prevents the process from dropping the request 
in the very first time slice that CSMA is running. Here the medium counts as busy, 
but only because we have just received an rts message. 

? A case NEW(cts(src,dest,d)) A dest = id is not required as a cts message is only 
expected in case an rts was sent, and hence handled in process RTSREACT. 
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Process 5. The Basic Routine (RTS/CTS) 


CSMA(id,nav) “ 


1. newpkt(data,dest). INIT(id,dest ,0,dataframe(data,id,dest) ,nav) 

2. + [NEW(dataframe(data,src,id))] deliver(data) . [timeout := now + sifs] 

3. [now > timeout] transmit(ackframe(src)) . CSMA(id,nav) 

+ [(NEW(rts(src,dest,d)) V NEW(cts(src,dest,d))) \dest # id^ nav < now+d] 
5. [bav := now+d] CSMA(id, nav) 


ba 
| 


6. + [NEW(rts(src,id,d)) A now > nav] [timeout := now + sifs] 

7z ( 

8. [MIDDLE A now > timeout—sifs] CSMA(id, nav) 

9. + [IDLE A now > timeout] transmit(cts(id,src,d—dur_cts—sifs)) . 
10. [nav := now+d—dur_cts—sifs]] CSMA(id, nav) 


Process 6. Clear Channel Assessment With Virtual Carrier Sense 


d 
CCA(id,dest,b,tries,dframe,nav) 1f 


1. [NEW(dataframe(data,src,id))] deliver(data) . [timeout := now + sifs] 

2, [now > timeout] transmit(ackframe(src)) . CCA(id,dest,b,tries,dframe,nav) 
3. + [(NEW(rts(src,dest,d)) VNEW(cts(src,dest,d))) \dest Æ id/A nav < now+d] 
a [hav := now+d] CCA(id,dest,b,tries,dframe,nav) 


5. + [NEW(rts(src,id,d)) A now > nav] [timeout := now + sifs] 
6 
( 
7 [7IDLE A now > timeout—sifs] CCA(id,dest,b,tries,dframe,nav) 
8. + [IDLE A now > timeout] transmit(cts(id,src,d—dur_cts—sifs)) . 
9. [nav := now+d—dur_cts—sifs|] CCA(id,dest,b,tries,dframe,nav) 
10. ) 
11. + [IDLE A now > nav] 
12.  [timeout:=now+difs] 
13. ( 
14. [IDLE] CCA(id,dest,b,tries,dframe,nav) 
15. + [IDLE A now > timeout] 
16. [timeout := now + b] 
17. ( 
18. [AIDLE] /* busy during backoff time */ 
19. [b := timeout — now] CCA(id,dest ,b, tries ,dframe nav) 
20. + [IDLE A now > timeout] /* idle for backoff time */ 
21. [d := sifs + dur-cts + sifs + dur(dframe) + sifs + dur_ack] 
22. transmit(rts(id,dest,d)) . 
23. CTSRECV(id,dest ,tries,now + max_cts_wait ,dframe,nav) 
24 ) 
25 ) 


Process 6 is the modified version of Process 3. The goal of this process is to 
send an rts message (Line 22). Before it can start its work, it waits until the 
medium is idle, and any time it is required to be silent has elapsed (Line 11). 
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Until this happens incoming data frames, rts or cts messages are treated just 
as in Process 5: Lines 1-10 copy Lines 2-11 of Process 5, except that afterwards 
the process returns to itself. Then Lines 12-20 are copied from Lines 7-15 from 
Process 3. Line 21 calculates the time other nodes ought to keep silent when 
receiving the rts message, and Line 23 passes control to the process CTSRECV, 
which awaits a cts response to the rts message transmitted in Line 22. The 
fourth argument of CTSRECV specifies the maximum time that process should 
wait for such a response; a good value for max_cts_wait is sifs + dur_cts. 
Process CTSRECV listens for this time to a cts message with source dest and 
destination id. In case the expected cts message arrives in time (Line 1), the 
node waits for a time sifs (Line 2) and then transmits the data frame and pro- 
ceeds to await an acknowledgement (Line 3). The fourth argument of ACKRECV 
specifies the maximum time the process should wait for such an acknowledge- 
ment; a good value for max_ack_wait is sifs+dur_ack. If the cts message does 
not arrive in time (Line 6), the process returns to INIT to send another rts 
message, while incrementing the counter tries (Line 7). While waiting for the 
cts message, any incoming rts or cts message destined for another node is 
treated exactly as in Process 5 (Lines 4-5). Incoming data frames cannot arrive 
when this process is running, and incoming rts messages to id are ignored. 


Process 7. Receiving a CTS 


ag 
CTSRECV(id,dest ,tries,ctstimeout ,dframe,nav) = 


1. [NEW(cts(dest,id,d))] 

2 [timeout := now + sifs] [now > timeout] 

3 transmit(dframe) . ACKRECV(id,dest,tries,now + max_ack wait ,dframe,nav) 
a. + [(NEW(rts(src,dest,d)) V NEW(cts(src,dest,d))) \dest +Æ id/ nav < now+d] 
5. nav := now+dl] CTSRECV(id,dest ,tries,ctstimeout ,dframe,nav) 
6 

7 


. + [now > ctstimeout] 
INIT(id,dest ,tries+1,dframe nav) 


Process 8. Receiving an ACK 


a 
ACKRECV(id,dest ,tries,acktimeout ,dframe,nav) se) 


1. [NEW(ackframe(id))] 

2. deliver(success) . CSMA(id,nav) 

3. + [(NEW(rts(src,dest ,d)) VNEW(cts(src,dest ,d))) dest # id/Anav < now+d] 
a [nav := now-+d]) ACKRECV(id,dest ,tries,acktimeout ,dframe ,nav) 
5 
6 


. + [now > timeout] /* nothing received */ 
INIT(id,dest ,tries+1,dframe nav) 


Process 8 handles the receipt of an acknowledgement in response to a success- 
ful data transmission. If an acknowledgement arrives, it must be from the node 
to which id has transmitted a data frame. In that case (Line 1), the network 
layer is informed that the sending of dframe was successful and the process loops 
back to Process 5 (Line 2). Line 5 describes the situation where no acknowledge- 
ment message arrives and the process times out. Also here CSMA/CA retries 
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to send the message; the counter tries is incremented. Lines 3—4 describe the 
usual handling of incoming rts or cts messages destined for another node. 


5.4 The Exposed Station Problem 


Another source of collisions in CSMA/CA is the well-known exposed station 
problem. This refers to a linear topology A — B — C — D, where an unending 
stream of messages between C and D interferes with attempts by A to get a 
message across to B. In the default CSMA/CA protocol as formalised in Sect. 5.1, 
transmissions from A to B may perpetually collide at B with transmissions from 
C destined for D. CSMA/CA with virtual carrier sensing mitigates this problem, 
for a cts sent by B in response to an rts sent by A will tell C to keep silent 
for the required duration. In fact, we can show that in the above topology, 
if max_retransmit=oo then packet delivery holds with probability 1. A non- 
probabilistic guarantee cannot be given since nodes A and C could behave in 
the same way, meaning if one node is sending out a message the other does the 
same at the very same moment, and if one is silent the other remains silent as 
well. In this scenario all messages to be sent are doomed. 

Based on our formalisation, we can prove that once the RTS/CTS handshake 
has been successfully concluded, meaning that all nodes within range of the 
intended recipient have received the cts, then packet delivery holds outright. So 
the only problem left is to achieve a successful RTS/CTS handshake. Since rts 
and cts messages are rather short, even by modest values of max_retransmit it 
becomes likely that such messages do not collide. 

In spite of this, CSMA/CA with (or without) virtual channel sensing cannot 
achieve packet delivery with probability 1 for general topologies. Assume the 


following network topology 


OROROORO 
C9—P3 


Here it may happen that one of the C;s is always busy transmitting a large 
message to D;; any given C; is occasionally silent (not sending any message), but 
then one of the others is transmitting. As C; is disconnected from Cj, for j Æ i, 
coordination between the nodes is impossible. As a consequence, the medium at 
A will always be busy, so that A cannot send an rts message from B. 


6 Related Work 


The CSMA protocol in its different variants has been analysed with different 
formalisms in the past. 

Multiple analyses were performed for the CSMA/CD protocol (CSMA with 
collision detection), a predecessor of CSMA/CA that has a constant backoff, i.e. 
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the backoff time is not increased exponentially, see [10,11,20,21,26]. In all these 
approaches frame collisions have to be modelled explicitly, as part of the pro- 
tocol description. In contrast, our approach handles collisions in the semantics; 
thereby achieving a clear separation between protocol specifications and link 
layer behaviour. 

Duflot et al. [10,11] use probabilistic timed automata (PTAs) to model the 
protocol, and use probabilistic model checking (PRISM) and approximate model 
checking (APMC) for their analysis. The model explained in [26] is based on 
PTAs as well, but uses the model checker UPPAAL as verification tool. These 
approaches, although formal, have very little in common with our approach. On 
the one hand it is not easy to change the model from CSMA/CD to CSMA/CA, 
as the latter requires unbounded data structures (or alike) to model the expo- 
nential backoff. On the other hand, as usual, model checking suffers from state 
space explosion and only small networks (usually fewer than ten nodes) can 
be analysed. This is sufficient and convenient when it comes to finding counter 
examples, but these approaches cannot provide guarantees for arbitrary network 
topologies, as ours does. 

Jensen et al. [20] use models of CSMA/CD to compare the tools SPIN and 
UPPAAL. Their models are much more abstract than ours. It is proven that no 
collisions will ever occur, without stating the exact conditions under which this 
statement holds. 

To the best of our knowledge, Parrow [21] is the only one who used process 
algebra (CCS) to model and analyse CSMA. His untimed model of CSMA/CD 
is extremely abstract and the analysis performed is limited to two nodes only, 
avoiding scenarios such as the hidden station problem. 

There are far fewer formal analyses techniques available when it comes to 
CSMA/CA (with and without virtual medium sensing). Traditional approaches 
to the analysis of network protocols are simulation and test-bed experiments. 
This is also the case for CSMA/CA (e.g. [4]). While these are important and 
valid methods for protocol evaluation, in particular for quantitative performance 
evaluation, they have limitations in regards to the evaluation of basic protocol 
correctness properties. 

Following the spirit of the above-mentioned research of model checking CSMA, 
Fruth [15] analyses CSMA/CA using PTAs and PRISM. He considers properties 
such as the minimum probability of two nodes successfully completing their 
transmissions, and maximum expected number of collisions until two nodes have 
successfully completed their transmissions. As before, this analysis technique 
does not scale; in [15] the experiments are limited to two contending nodes only. 

Beyond model checking, simulation and test-bed experiments, we are only 
aware of two other formal approaches. In [1] Markov chains are used to derive 
an accurate, analytical model to compute the throughput of CSMA/CA. Cal- 
culating throughput is an orthogonal task to our vision of proving (functional) 
correctness. 

An approach aiming at proving the correctness of CSMA/CA with virtual 
carrier sensing (RTS/CTS), and hence related to ours, is presented in [3]. Based 
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on stochastic bigraphs with sharing it uses rewrite rules to analyse quantita- 
tive properties. Although it is an approach that is capable to analyse arbitrary 
topologies, to apply the rewrite rules a particular topology needs to be modelled 
by a directed acyclic graph structure, which is part of the bigraph. 


7 Conclusion 


In this paper we have proposed a novel process algebra, called ALL, that can 
be used to model, verify and analyse link layer protocols. Since we aimed at a 
process algebra featuring aspects of the link layer such as frame collisions, as 
well as arbitrary data structures (to model a rich class of protocols), we could 
not use any of the existing algebras. The design of ALL is layered. The first 
layer allows modelling protocols in some sort of pseudo code, which hopefully 
makes our approach accessible for network and software researchers/engineers. 
The other layers are mainly for giving a formal semantics to the language. The 
layer of partial network expressions, the third layer, provides a unique and sophis- 
ticated mechanism for modelling the collision of frames. As it is hard-wired in 
the semantics there is no need to model collisions manually when modelling a 
protocol, as it was done before [21]. Next to primitives needed for modelling link 
layer protocols (e.g. transmit) and standard operators of process algebra (e.g. 
nondeterministic choice), ALL provides an operator for probabilistic choice. 

This operator is needed to model aspects of link layer protocols such as the 
exponential backoff for the Carrier-Sense Multiple Access with Collision Avoid- 
ance protocol, the case study we have chosen to demonstrate the applicability 
of ALL. We have modelled and analysed two versions of CSMA/CA, without 
and with virtual carrier sensing. Our analysis has confirmed the hidden station 
problem for the version without virtual carrier sensing. However, we have also 
shown that the version with virtual carrier sensing overcomes not only this prob- 
lem, but also the exposed station problem with probability 1. Yet the protocol 
cannot guarantee packet delivery, not even with probability 1. 

To perform this analysis we had to formalise suitable liveness properties for 
link layer protocols specified in our framework. 
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Abstract. We consider a class of interrupt-driven programs that model 
the kernel API libraries of some popular real-time embedded operating 
systems and the synchronization mechanisms they use. We define a natu- 
ral notion of data races and a happens-before ordering for such programs. 
The key insight is the notion of disjoint blocks to define the synchronizes- 
with relation. This notion also suggests an efficient and effective lockset 
based analysis for race detection. It also enables us to define efficient 
“sync-CFG” based static analyses for such programs, which exploit data 
race freedom. We use this theory to carry out static analysis on the 
FreeRTOS kernel library to detect races and to infer simple relational 
invariants on key kernel variables and data-structures. 


Keywords: Static analysis - Interrupt-driven programs - Data races 


1 Introduction 


Embedded software is widespread and increasingly employed in safety-critical 
applications in medical, automobile, and aerospace domains. These programs 
are typically multi-threaded applications, running on uni-processor systems, that 
are compiled along with a kernel library that provides priority-based schedul- 
ing, and other task management and communication functionality. The appli- 
cations themselves are similar to classical multi-threaded programs (using lock, 
semaphore, or queue based synchronization) although they are distinguished by 
their priority-based execution semantics. The kernel on the other hand typically 
makes use of non-standard low-level synchronization mechanisms (like disabling- 
enabling interrupts, suspending the scheduler, and flag-based synchronization) 
to ensure thread-safe access to its data-structures. In the literature such software 
(both applications and kernels) are referred to as interrupt-driven programs. Our 
interest in this paper is in the subclass of interrupt-driven programs correspond- 
ing to kernel libraries. 

Efficient static analysis of concurrent programs is a challenging problem. One 
could carry out a precise analysis by considering the product of the control flow 
graphs (CFGs) of the threads, however this is prohibitively expensive due to the 
exponential number of program points in the product graph. A promising direc- 
tion is to focus on the subclass of race-free programs. This is an important class 
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of programs, as most developers aim to write race-free code, and one could try 
to exploit this property to give an efficient way of analyzing programs that fall in 
this class. In recent years there have been many techniques [7,11,12,18,21] that 
exploit the race-freedom property to perform sound and efficient static analysis. 
In particular [11,21] create an appealing structure called a “sync-CFG” which 
is the union of the control flow graphs of the threads augmented with possi- 
ble “synchronization” edges, and essentially perform sequential analysis on this 
graph to obtain sound facts about the concurrent program. However these tech- 
niques are all for classical lock-based concurrent programs. A natural question 
asks if we can analyze interrupt-driven programs in a similar way. 

There are several challenges in doing this. Firstly one needs to define what 
constitutes a data race in a generalized setting that includes these programs. 
Secondly, how does one define the happens-before order, and in particular the 
synchronizes-with relation that many of the race-free analysis techniques rely 
on, given the ad-hoc synchronization mechanisms used in these programs. 

A natural route that suggests itself is to translate a given interrupt-driven 
program into one that uses classical locks, and faithfully captures the interleaved 
executions of the original program. One could then use existing techniques for 
lock-based concurrency to analyze these programs. However, this route is fraught 
with many challenges. To begin with, it is not clear how one would handle flag- 
based synchronization which is one of the main synchronization mechanisms 
used in these programs. Even if one could handle this, such a translation may 
not preserve data races, in that the original program might have had a race but 
the translated program does not. Finally, some of the synchronizes-with edges in 
the translated program are clearly unnecessary, leading to imprecise data-flow 
facts in the analyses. 

In this paper, we show that it is possible to take a more organic route and 
address these challenges in a principled way that could apply to other non- 
standard classes of concurrent systems as well. Firstly, we propose a general 
definition of a data race that is not based on a happens-before order, but on 
the operational semantics of the class of programs under consideration. The def- 
inition essentially says that two statements s and ¢ can race, if two notional 
“blocks” around them can overlap in time during an execution. We believe that 
this definition accurately captures what it is that a programmer tries to avoid 
while dealing with shared variables whose values matter. Secondly we propose 
a way of defining the synchronizes-with relation, based on the notion of disjoint 
blocks. These are statically identifiable pairs of path segments in the CFGs of dif- 
ferent threads that are guaranteed to never overlap (in time) during an execution 
of the program, much like blocks of code that lie between an acquire and release 
of the same lock. This relation now suggests a natural sync-CFG structure on 
which we can perform analyses like value-set (including interval, null-deference, 
and points-to analysis), and region-based relational invariant analysis, in a sound 
and efficient manner. We also use the notion of disjoint blocks to define an effi- 
cient and precise lock-set-based analysis for detecting races in interrupt-driven 
programs. 
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We implement some of these analyses on the FreeRTOS kernel library [3] 
which is one of the most widely used open-source real-time kernels for embed- 
ded systems, comprising about 3,500 lines of C code. Our race-detection analysis 
reports a total of 64 races in kernel methods, of which 18 turn out to be true 
positives. We also carry out a region-based relational analysis using an imple- 
mentation based on CIL [22]/Apron [15], to prove several relational invariants 
on the kernel variables and abstracted data-structures. 


2 Overview 


We give an overview of our contributions via an illustrative example modelled 
on a portion of the FreeRTOS kernel library. Figure 1 shows an interrupt-driven 
program that contains a main thread that first initializes the kernel variables. 
The variables represent components of a message queue, like msgw (the number 
of messages waiting in the queue), len (max length of the queue), wtosend (the 
number of tasks waiting to send to the queue), wtorec (the number of tasks 
waiting to receive from the queue), and RxLock (a counter which also acts as 
a synchronization flag that mediates access to the waiting queues). The main 
thread then creates (or spawns) two threads: gsend which models the kernel 
API method for sending a message to the queue, and gqrec_ISR which models 
a method for receiving a message, and which is meant to be called from an 
interrupt-service routine. The basic semantics of this program is that the ISR 
thread can interrupt gsend at any time (provided interrupts are not disabled), 
but always runs to completion itself. The threads use disableint/enableint 
to disable and enable interrupts, suspendsch/resumesch to suspend/resume 
the scheduler (thereby preventing preemption by another non-ISR thread), and 
finally flag-based synchronization (using the RxLock variable), as different means 
to ensure mutual exclusion. 

Our first contribution is a general notion of data races which is applicable 
to such programs. We say that two conflicting statements s and t in two dif- 
ferent threads are involved in a data race if assuming s and t were enclosed in 
a notional “block” of skip statements, there is an execution in which the two 
blocks “overlap” in time. The given program can be seen to be free of races. 
However if we were to remove the disableint statement of line 10, then the 
statements accessing msgw in lines 12 and 42 would be racy, since soon after the 
access of msgw in qsend at line 12, there could be preemption by grec_ISR which 
goes on to execute line 42. 

Next we illustrate the notion of “disjoint blocks” which is the key to defining 
synchronizes-with edges, which we need in our sync-CFG analysis as well as to 
define an appropriate happens-before relation. Disjoint blocks are also used in 
our race-detection algorithm. A pair of blocks of code (for example any of the 
like-shaded blocks of code in the figure) are disjoint if they can never overlap 
during an execution. For example, the block comprising lines 11-14 in qsend and 
the whole of grec_ISR, form a pair of disjoint blocks. 

Next we give an analysis for checking race-freedom, by adapting the standard 
lockset analysis [24] for classical concurrent programs. We associate a unique 
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main: 

msgw := 0 
len := 10; 
wtosend := 0; 


i 


0; 
RxLock := 0; 


create (qsend) ; 


1 
2 
3 
4 wtorec := 
5 Š 
0 = RaLock = msgw < len = 10 § 
T 


,“1 create (qrec_ISR) ; 
$ Ea 


qsend: á “>~a qrec_ISR: 

; i 7 msgw < len, 0 < RaLock 
magi L ten 0S Raroa esia a o o Os wioree: 0. S wioseng 
o< wtorec, O < wtosend 1 (Cuiseyy < eiay) ih AN Ae TOONT 

2 msgwł+; v 49 EU ow < len, 0 < RrLock 
3 aif (wtorec > 0) ee 44 if(wtosend > 0) 9 2 wiorec, 0 < wtosend 
4 wtorec--; í \ #5 |; | wtosend--; 
graen OE sigs ensbleint; sas page & ten 0 $ Reboot 
= 2 VS A a 0 < wtorec, 0 < wtosend 
6 } M7 1 else 
7 else { [8 RxLock++; 
8  enableint; wv) 49 i msgw < len, 0 < RaLock 
9 suspendsch; gt i P 0 < wtorec, O < wtosend 
. . * f 
20 disableint; ,’ Woe a 
ž LE ag 
21 | ReLock++; )“ a 
msgw < len, O < RaLock x z att 1 
0 < wtorec, 0 < wtosend22 mebleint ie : i 
23 jwtosend++; Zoi 


24 |disableint; 4% / / 
msgw < len, 0 < RzLocko5 while(RxLock > 1) iy 


0 < wtorec, 0 < wtosend 5 F 
26 if(wtosend > 0) , 
1 


27 wtosend--; / , 
28 RxLock--; HERA 
A 
msgw < len, 0 < RaLock29 } eas 
0 < wtorec, 0 < wtosend30 RxLock := 0; / 


msgw < len, 0 = RaLock leint; 
OL teres, 0 c itosend È  cmabbeint; 
= = 31 resumesch; 


31 } 


Fig. 1. An interrupt-driven program modelled on the FreeRTOS kernel library. Simi- 
larly shaded blocks denote disjoint blocks. Some of the sync-with edges are shown in 
dashed lines. Some edges like 22 — 41 and 49 — 20 have been omitted for clarity. 


lock with each pair of disjoint blocks, and add notional acquires and releases of 
this lock at the beginning and end (respectively) of these blocks. We now do 
the standard lockset analysis on this version of the program, and declare two 
accesses to be non-racy if they hold sets of locks with a non-empty intersection. 

Finally, we show how to do data-flow analysis for such programs in a sound 
and efficient way. The basic idea is to construct a “sync-CFG” for the program 
by unioning the control-flow graphs of the threads, and adding sync edges that 
capture the synchronizes-with edges (going from the end of a block to the begin- 
ning of its paired block), for example line 14 to line 41 and line 49 to line 11. 
The sync-edges are shown by dashed arrows in the figure. We now do a standard 
“value-set” analysis (for example interval analysis) on this graph, keeping track 
of a set of values each variable can take. The resulting facts about a variable are 
guaranteed to be sound at points where the variable is accessed (or even “owned” 
in the sense that a notional read of the variable at that point is non-racy). For 
example an interval analysis on this program would give us that 0 < msgw at 
line 14. Finally, we could do a region-based value-set analysis, by identifying 
regions of variables that are accessed as a unit — for example msgw and len could 
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be in one region, while wtosend and wtorec could be in another. The figure 
shows some facts inferred by a polyhedral analysis based on these regions, for 
the given program. 


3 Interrupt-Driven Programs 


The programs we consider have a finite number of (static) threads, with a des- 
ignated “main” thread in which execution begins. The threads access a set of 
shared global variables, some of which are used as “synchronization flags”, using 
a standard set of commands like assignment statements of the form x := e, 
conditional statements (if-then-else), loop statements (while), etc. In addi- 
tion, the threads can use commands like disableint, enableint (to disable 
and enable interrupts, respectively), suspendsch, resumesch (to suspend and 
resume the scheduler, respectively), while the main thread can also create a 
thread (enable it for execution). Table 1 shows the set of basic statements cemdy.r 
over a set of variables V and a set of threads T. 

We allow standard integer and Boolean expressions over a set of variables V. 
For an integer expression e over V, and an environment @¢ for V, we denote by 
[e] the integer value that e evaluates to in @. Similarly for a Boolean expression 
b, we denote the Boolean value (true or false) that b evaluates to in ¢ by [D]g. 
For a set of environments @ for a set of variables V, we define the set of integer 
values that e can evaluate to in an environment in ®, by |e] = {[e]a | ¢ € 8}. 
Similarly, for a boolean expression b, we define the set of environments in @ that 
satisfy b to be [blo = {¢ € © | [b]g = true}. 

Each thread is of one of two types: “task” threads that are like standard 
threads, and “ISR” threads that represent threads that run as interrupt ser- 
vice routines. The main thread is a task thread, which is the only task thread 
enabled initially. The main thread can enable other threads (both task and ISR) 
for execution using the create command. Task threads can be preempted by 
other task threads (whenever interrupts are not disabled, and the scheduler is 
not suspended) or by ISR threads (whenever interrupts are not disabled). On 
the other hand ISR threads cannot be preempted and are assumed to run to 
completion. 

Only task threads are allowed to use disableint, enableint, suspendsch 
and resumesch commands. Similarly, if flag-based synchronization is used, only 
task threads can modify the flag variable, while an ISR can only check whether 
the flag is set or not, and perform some actions accordingly. 

Formally we represent an interrupt-driven program P as a tuple (V, T) where 
V is a finite set of integer variables, and T is a finite set of named threads. Each 
thread t € T has a type which is one of task or ISR, and an associated control- 
flow graph of the form G; = (Lz, s+, inst,) where L; is a finite set of locations of 
thread t, s € Ly is the start location of thread t, inst, C Ly x emdy,r x Ly is a 
finite set of instructions of thread t. 

Some definitions related to threads will be useful going forward. We denote 
by Lp = User Lr the disjoint union of the thread locations. Whenever P is clear 
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Table 1. Basic statements cmdy,r over variables V and threads T 


Command | Description 


skip Do nothing 


x i=e Assign the value of expression e to variable x € V 


assume(b) | Enabled only if expression b evaluates to true, acts like skip 


create(t) | Enable thread t € T for execution 


disableint | Disable interrupts and context switches 


enableint | Enable interrupts and context switches 


suspendsch | Suspend the scheduler (other task threads cannot preempt the 
current thread); Also sets ssflag variable 


resumesch | Resume the scheduler (other task threads can now preempt the 
current thread); Also unsets ssflag variable 


from the context we will drop the subscript of P from Lp and its decorations. 
For a location | € L we denote by tid(l) the thread t which contains location l. 
We denote the set of instructions of P by instp = Uer inst,. For an instruction 
L € inst, we will also write tid(¿) to mean the thread t. For an instruction 
L= (l,c, l’), we call l the source location, and l’ the target location of . 

We denote the set of commands appearing in program P by cmd(P). We will 
consider an assignment x := e as a write-access to x, and as a read-access to 
every variable that appears in the expression e. Similarly, assume(b) is considered 
to be a read-access of every variable that occurs in expression b. We say two 
accesses are conflicting accesses if they are read/write accesses to the same 
variable, and at least one of them is a write. We assume that the control-flow 
graph of each thread comes from a well-structured program. Finally, we assume 
that the main thread begins by initializing the variables to constant values. 
Figure 2 shows an example program and the control-flow-graphs of its threads. 

We define the operational semantics of an interrupt-driven program using a 
labeled transition system (LTS). Let P = (V,T) be a program. We define an 
LTS Tp = (Q, X, s, =>) corresponding to P, where: 


— Q isa set of states of the form (pc, ¢, enab, rt, it, id, ss), where pe € T > L is 
the program counter giving the current location of each thread, ¢ € V > Z 
is a valuation for the variables, enab C T is the set of enabled threads, rt € T 
is the currently running thread; it € T is the task thread which is interrupted 
when the scheduler is suspended; and id and ss are Boolean values telling us 
whether interrupts are disabled (id = true) or not (id = false) and whether 
the scheduler is suspended (ss = true) or not (ss = false). 

— The set of labels X is the set of instructions inst p of P. 

— The initial state s is (At.s;, Ax.0, {main}, main, main, false, false). Thus all 
threads are at their entry locations, the initial environment sets all variables 
to 0, only the main thread is enabled and running, the interrupted task is 
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main: main tl t2 
1. x := 03 le 7 D 
2; y 17.05 x := 0 x:sx+1 disableint 
3. t := 0; l 
4. create(t1); 2 8 10 
5. create(t2); y:=0 y =t 
8 3 $ 11 
t1 t2: EED bi ami 
Te a +13 9. disableint; 46 12 
8: 103 yaeta create (t1) assume (t<=0) assume (t>0) 
11. t := x; 
12. if(t > 0) { 5 16 13 
13% yey create (t2) trttl y :=y+1 
14. } y 
@ 17 14 è 
15. else { . 
16: ts=tti; skip skip 
17. } 18 
18. enableint; Shien 
19. 
19 
(a) Example program (b) Control-flow-graph representation 


Fig. 2. An example program and its CFG representation. 


set to main (this is a dummy value as it is used only when the scheduler is 
suspended), interrupts are enabled, and the scheduler is not suspended. 
— For an instruction + = (l, c, l’) in instp, with tid(v) = t, we define 


(pe, ġ, enab, rt, it, id, ss) >, (pc', ġ', enab’, rt’, it’, id’, ss’) 


iff the following conditions are satisfied: 
e t € enab; pc(t) = l; pc’ = pelt I); 
e if id is true or rt is an ISR then t = rt; 
e if ss is true, then either t = rt or t is an ISR thread; 
e Based on the command c, the following conditions must be satisfied: 
x If c is the skip command then ¢’ = ¢, enab’ = enab, id’ = id, and 
ss’ = 8s. 
«If c is an assignment statement of the form z := e then ¢’ = ¢[x — [ela]. 
enab’ = enab, id’ = id, and ss’ = ss. 
* If c is a command of the form assume(b) then [b]4 = true, ¢’ = ¢, 
enab’ = enab, id’ = id, and ss’ = ss. 
* If cis a create(u) command then t = main, ¢' = ¢, enab’ = enabU{u}, 
id’ = id, and ss’ = ss. 
x If c is the disableint command then ¢' = ¢, enab’ = enab, id’ = true, 
and ss’ = ss. 
x If c is the enableint command then ¢’ = ¢, enab’ = enab, id’ = false, 
and ss’ = ss. 
* If c is the suspendsch command then ¢/ = ¢[ssflag + 1], enab’ = 
enab, id’ = id, and ss’ = true. 
x If c is the resumesch command then ¢' = ¢[ssflag — 0], enab’ = enab, 
id’ = id, and ss’ = false. 
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e In addition, the transitions set the new running thread rt’ and interrupted 
task it’ as follows. If t is an ISR thread, ss is true, and + is the first 
statement of t then it’ = rt, rt’ = t. If t is an ISR thread, ss is true, and 1 
is the last statement of t then it’ = it, rt’ = it. In all other cases, rt’ = t 
and it! = it. 


An execution o of P is a finite sequence of transitions in Tp from the initial 
state s: 0 = 7,71,---,Tn (n > 0) from =>, such that there exists a sequence 
of states go, q1, ---,qn+1 from Q, with qo = s and Ti = (qi, ti, qi+1) for each 
0 <i <n. Wherever convenient we will also represent an execution like ø above 
as a sequence of the form go =>. G1 >a °°: =>., Gn+1- We say that a state q E€ Q 
is reachable in program P if there is an execution of P leading to state q. 


4 Data Races and Happens-Before Ordering 


In this section we propose a definition of a data race which has general applicabil- 
ity, and also define a natural happens-before order for interrupt-driven programs. 


4.1 Data Races 


Data races have typically been defined in the literature in terms of a happens- 
before order on program executions. In the classical setting of lock-based syn- 
chronization, the happens-before relation is a partial order on the instructions in 
an execution, that is reflexive-transitive closure of the union of the program-order 
relation between two instructions in the same thread, and the synchronizes-with 
relation which relates a release of a lock in a thread to the next acquire of the 
same lock in another thread. Two instructions in an execution are then defined 
to be involved in a data race if they are conflicting accesses to a shared variable 
and are not ordered by the happens-before relation. 

We feel it is important to have a definition of a data race that is based on the 
operational semantics of the class of programs we are interested in, and not on a 
happens-before relation. Such a definition would more tangibly capture what it 
is that a programmer typically tries to avoid when dealing with shared variables 
whose consistency she is worried about. Moreover, when coming up with a defi- 
nition of the happens-before order (the synchronizes-with relation in particular) 
for non-standard concurrent programs like interrupt-driven programs, it is use- 
ful to have a reference notion to relate to. For instance, one could show that a 
proposed happens-before order is strong enough to ensure the absence of races. 

We propose to define a race between two conflicting statements in a program 
in terms of whether two imaginary blocks enclosing each of these statements can 
overlap in an execution. Let us consider a multi-threaded program P in a class of 
concurrent programs with a certain operational execution semantics. Consider a 
block of contiguous instructions in a thread t of a program P and another block 
in thread t’ of P. We say that these two blocks are involved in a high-level race 
in an execution of P if they overlap with each other during the execution, in that 
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one block begins in between the beginning and ending of the other. We say two 
conflicting statements s and ¢ in P are involved in a data race (or are racy), if 
the following condition is true: Consider the program P’ which is obtained from 
P by replacing the statement s by the block “skip; s; skip”, and similarly for 
statement t. Then there is an execution of P’ in which the two blocks containing 
s and t are involved in a high-level race. The definition is illustrated in Fig. 3. 
We say a program P is race-free if no pair of instructions in it are racy. 


ELE lll ® 


$: 
S; 


Fig. 3. Illustrating the definition of a data race on statements s and t. A program P, 
its transformation P’, and an execution of P’ in which the blocks overlap. 


The rationale for this definition is that the concerned statements s and t may 
be compiled down to a sequence of instructions (represented by the blocks with 
skip’s around s and t) depending on the underlying processor and compiler, 
and if these instructions interleave in an execution, it may lead to undesirable 
results. 

To illustrate the definition, consider the program in Fig. 2a. The accesses to 
x in line 7 and line 11 can be seen to be racy, since there is an execution of the 
augmented program P’ in which t1 performs the skip followed by the increment 
to x at line 7, followed by a context switch to thread t2 which goes on to execute 
lines 9 and 10 and then the read of x in line 11. On the other hand, the version 
of the program in which line 7 is enclosed in a disableint-enableint block, 
does not contain a race. 

We note that for classical concurrent programs, it might suffice to define a 
race as consecutive occurrences of conflicting accesses in an execution, as done in 
[4,17]. However, this definition is not general enough to apply to interrupt-driven 
programs. By this definition, the statements in lines 7 and 11 of the program in 
Fig. 2a are not racy, as there is no execution in which they happen consecutively. 
This is because the disableint-enableint block containing the access in line 11 
is “atomic” in that the statements in the block must happen contiguously in any 
execution, and hence the instructions corresponding to line 7 and line 11 can 
never happen immediately one after another. 


4.2 Disjoint Blocks and the Happens-Before Relation 


Now that we have a proposed definition of races, we can proceed to give a 
principled way to define the happens-before relation for our class of interrupt- 


706 N. Chopra et al. 


driven programs. The main question is how does one define the synchronizes- 
with relation. Our insight here is that the key to defining the synchronizes-with 
relation lies in identifying what we call disjoint blocks for the class of programs. 
Disjoint blocks are statically identifiable pairs of path segments in the CFGs of 
different threads, which are guaranteed by the execution semantics of the class 
of programs never to overlap in an execution of the program. Disjoint block 
structures — for example in the form of blocks enclosed between locks/unlocks of 
the same lock — are the primary mechanism used by developers to ensure race- 
freedom. The synchronizes-with relation in an execution can then be defined as 
relating, for every pair (A, B) of disjoint blocks in the program, the end of block 
A to the beginning of the succeeding occurrence of block B in the execution. The 
happens-before order for an execution can now be defined, as before, in terms 
of the program order and the synchronizes-with order, and is easily seen to be 
sufficient to ensure non-raciness. 

Let us illustrate this hypothesis on classical lock-based programs. The disjoint 
block pairs for this class of programs are segments of code enclosed between 
acquires and releases of the same lock; or the portion of a thread’s code before it 
spawns a thread t, and the whole of thread t’s code; and similarly for joins. The 
synchronizes-with relation between instructions in an execution essentially goes 
from a release to the succeeding acquire of the same lock. If two accesses are 
related by the resulting happens-before order, they clearly cannot be involved 
in a race. 

We now focus on defining a happens-before relation based on disjoint blocks 
for our class of interrupt-driven programs. We have identified eight pairs of 
disjoint block patterns for this class of programs, which are depicted in Fig. 4. 
We use the following types of blocks to define the pairs. A block of type D is 
a path segment in a task thread that begins with a disableint and ends with 
an enableint with no intervening enableint in between. A block of type S 
is a path segment in a task thread that begins with a suspendsch and ends 
with a resumesch with no intervening resumesch. An J block is an initial and 
terminating path segment in an ISR thread (i.e. begins with the first instruction 
and ends with a terminating instruction). Similarly, for a task thread t, T, is 
an initial and terminating path in t, while M; is an initial segment of the main 
thread that ends with a create(t) command. A block of type Css fiag is a path 
segment in an ISR thread corresponding to the then block of a conditional that 
checks if ss flag = 0. For a synchronization flag f, Cy is the path segment in 
an ISR thread corresponding to the then block of a conditional that checks if 
f =0. Finally Fy is a segment between statements that set f to 1 and back to 
0, in a task thread. We also require that an Fp segment be within the scope of 
a suspendsch command. 

We can now describe the pairs of disjoint blocks depicted in Fig. 4. Case (a) 
says that two D blocks in different task threads are disjoint. Clearly two such 
blocks can never overlap in an execution, since once one of the blocks begins exe- 
cution no context-switch can occur until interrupts are enabled again. Case (b) 
says that D and I blocks are disjoint. Once again this is because once the D block 
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task: task: task: ISR: ISR: ISR: 
disableint; disableint; disableint; // begin // begin // begin 
D D D Ba I J 
enableint enableint enableint // end // end // end 
(a) (b) (c) 
task: task: task: task: main: boars 
disableint; suspendsch; suspendsch; suspendsch; // begin // begin 
D Ss S S M; Te 
enableint resumesch; resumesch; resumesch; create(t) // ena 
(d) (e) (f) 
task: ISR: task: ISR: 
suspendsch; iflsafleg= 0) { // with scheduler 2£(£ = 9) { 
// suspended 
f := 1; 
5 Csarrag Cyr 
Fy 
} 
resumesch; } 
f := 0 


(g) (h) 


Fig. 4. Disjoint blocks in an interrupt-driven program. 


begins execution no ISR can run until interrupts are enabled again, and once 
an ISR begins execution it runs to completion without any context-switches. 
Case (e) says that S blocks in different task threads are disjoint, because once 
the scheduler is suspended no context-switch to another task thread can occur. 
Case (f) says that M, and T, blocks are disjoint, since a thread cannot begin 
execution before it is created in main. Case (g) says that an S block is disjoint 
from a Csstiag block. This is because once the scheduler is suspended by the 
suspendsch command, and even if a context-switch to an ISR occurs, the then 
block of the if statement will not execute. Conversely, if the ISR is running 
there can be no context-switch to another thread. Finally, case (h) is similar to 
case (g). We note that the disjoint block pairs are not ordered (the relation is 
symmetric). 

We can now define the synchronizes-with relation as follows. Let o = qo >., 
qı >a tt >n Inti be an execution of P. We say instruction t synchronizes- 
with an instruction 1; of P in ø, if i < j, tid(u;) A tid(u;), and there exists a pair 
of disjoint blocks A and B, with 1; ending block A and z; beginning block B. As 
usual we say 4; is program-order related to ¿j iff i < j and tid(u;) = tid(z;). We 
define the happens-before relation on o as the reflexive-transitive closure of the 
union of the program-order and synchronizes-with relations for ø. 

We can now define a HB-race in an execution o of P as follows: we say that 
two instructions 1; and ¿j in ø are involved in a HB-race if they are conflicting 
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instructions that are not ordered by the happens-before relation in ø. We say 
that two instructions in P are HB-racy if there is an execution of P in which 
they are involved in a HB-race. Finally, we say a program P is HB-race-free if 
no two of its instructions are HB-racy. 

Once again, it is fairly immediate to see that if two statements of a program 
are not involved in a HB-race, they cannot be involved in a race. Further, if 
two statements belong to disjoint blocks, then they are clearly happens-before 
ordered in every execution. Hence belonging to disjoint blocks is sufficient to 
ensure that the statements are happens-before ordered, which in turn ensures 
that the statements cannot be involved in a race. 


5 Sync-CFG Analysis for Interrupt-Driven Programs 


In this section we describe a way of lifting a sequential value-set analysis in 
a sound way for a HB-race free interrupt-driven program, in a similar way to 
how it is done for lock-based concurrent programs in [11]. A value-set analysis 
keeps track of the set of values each variable can take at each program point. 
The basic idea is to create a “sync-CFG” for a given interrupt-driven program 
P, which is essentially the union of the CFGs of each thread of P, along with 
“may-synchronize-with” edges between statements that may be synchronizes- 
with related in an execution of P, and then perform the value-set analysis on 
the resulting graph. Whenever the given program is HB-race free, the result of 
the analysis is guaranteed to be sound, in a sense made clear in Theorem 1. 


5.1 Sync-CFG 


We begin by defining the “sync-CFG” for an interrupt-driven program. It is 
on this structure that we will do the value-set analysis. Let P = (V,T) be 
an interrupt-driven program, and let G be the disjoint union (over threads 
t € T) of the CFGs G;. We define a set of may-synchronize-with edges in G, 
denoted MSW (G), as follows. The edges correspond to the pairs of disjoint blocks 
depicted in Fig. 4, in that they connect the ending of one block to the beginning 
of the other block in the pair. Consider two instructions ¿ = (l,c, m) € inst, 
and « = (I’,c',m’) € insty, with t 4 t’. We add the edge (m, l’) in MSW (G), 
iff for some pair of disjoint blocks (A, B), « ends a block of type A in thread t 
and « begins a block of type B in thread t’. For example, corresponding to a 
(D, D) pair of disjoint blocks, we add the edge (m, i’) when c is an enableint 
command, and c’ is a disableint command. 

The sync-CFG induced by P is the control flow graph given by G along with 
the additional edges in MSW (G). Figure6 shows a program P, and its induced 
sync-CFG. 


5.2 Value Set Analysis 


We first spell out the particular form of abstract interpretation we will be using. 
It is similar to the standard formulation of [9], except that it is a little more 
general to accommodate non-standard control-flow graphs like the sync-CFG. 
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An abstract interpretation of a program P = (V,T) is a structure of the form 
A= (D,<,do, F) where 


— D is the set of abstract states. 

— (D,<) forms a complete lattice. We denote the join (least upper bound) in 
this lattice by U<, or simply U when the ordering is clear from the context. 

— dọ € D is the initial abstract state. 

— F : instp — (D —> D) associates a transfer function F(v) (or simply F,) with 
each instruction ų¿ of P. We require each transfer function F, to be monotonic, 
in that whenever d < d’ we have F,(d) < F,(d’). 


An abstract interpretation A = (D, <, dọ, F) of P induces a “global” transfer 
function Fa : D — D, given by Fa(d) = do U | | cinstp F.(d). This transfer 
function can also be seen to be monotonic. By the Knaster-Tarski theorem [28], 
Fa has a least fixed point (LFP) in D, which we denote by LFP(F4), and refer 
to as the resulting value of the analysis. 

A value set for a set of variables V is a map vs : V — 2%, associating a 
set of integer values with each variable in V. A value set vs induces a set of 
environments ®,, in a natural way: Bus = {¢ | forall z € V, (x) € vs(x)} 
(i.e. essentially the Cartesian product of the values sets). Conversely, a set of 
environments ® for V, induces a value set valset(®) given by valset(®)(x) = 
{v € Z | 4¢ € &, g(x) = v}, which is the “projection” of the environments to 
each variable x € V. Finally, we define a point-wise ordering on value sets as 
follows: vs < vs’ iff vs(x) C vs'(x) for each variable x in V. We denote the least 
element in this ordering by vs, = Ax.. 

We can now define the value-set analysis Avset for an interrupt-driven pro- 
gram P = (V,T) as follows. Let Asset = (D, <, do, F) where 


>—) 


— D is the set Lp — (V — 27) (thus an element of D associates a value-set 
with each program location) 

— The ordering d < d’ holds iff d(l) < d’(1) for each 1 € Lp 

— The initial abstract value do is given by: 


_\, f Av-{0} if 1 = Smain 
a { vs; otherwise. 


— The transfer functions are given as follows. Given an abstract value d, and 
a location | € Lp, we define vs? to be the join of the value-set at l, and 
the value-set at all may-synchronizes-with edges coming into l. Thus vs? = 

d(l) U< Llin yemsw a) Un). Below we will use ® as an abbreviation of the set 


Bsa of environments induced by vsi. Let ı = (1,¢, l’) be an instruction in P. 


e If cis the command x := e then F,(d) = d’ where 


d : 1 
I _ fus [e= [els] if m=! 
d (m) = { US | otherwise. 
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e If cis the command assume(b), then F,(d) = d’ where 


f= eee ifm=l 


US| otherwise. 


e If cis any other command (skip, disableint, enableint, suspendsch, 
resumesch, or create) then F,(d) = d’ where 


ads 1 
f _ just ifm=l 
d(m) = { vs, otherwise. 
Figure 6 shows the results of a value-set analysis on the sync-CFG of program 
P>. The data-flow facts are shown just before a statement, at selected points in 
the program. 


Soundness. The value-set analysis is sound in the following sense: if P is a HB- 
race free program, and we have a reachable state of P at a location / in a thread 
where a variable x is read; then the value of x in this state is contained in the 
value-set for x, obtained by the analysis at point l. More formally: 


Theorem 1. Let P = (V,T) be an HB-race free interrupt-driven program, and 
let d* be the result of the analysis Aysex on P. Let l be a location in a thread 
t € T where a variable x is read (i.e. P contains an instruction of the form 
(l,c,U) where c is a read access of x). Let ọ be an environment at l reachable 
via some execution of P. Then (x) € d*(1)(a). 


The proof of this theorem is similar to the one for classical concurrent pro- 
grams in [11] (see [10] for a more accurate proof). The soundness claim can 
be extended to locations where a variable is “owned” (which includes locations 
where it is read). We say a variable x is owned by a thread t at location l, if an 
inserted read of x at this point is non-HB-racy in the resulting program. 


Region-Based Analysis. One problem with the value-set analysis is that it may 
not be able to prove relational invariants (like x < y) for a program. One way 
to remedy this is to exploit the fact that concurrent programs often ensure race- 
free access to a region of variables, and to essentially do a region-based value-set 
analysis, as originally done in [21]. More precisely, let us say we have a partition 
of the set of variables V of a program P into a set of regions R,,..., Rn. We 
classify each read (write) access to a variable x in a region R, as an read (write) 
access to region R. We say that two instructions in an execution of P are involved 
in a HB-region-race, if the two instructions are conflicting accesses to the same 
region R, and are not happens-before ordered in the execution. A program is 
HB-region-race free if none of its executions contain a HB-region-race. 

We can now define a region-based version of the value-set analysis for a 
program P, which we call Arvset. The value-set for a region R is a set of valuations 
(or sub-environments) for the variables in R. The transfer functions are defined 
in an analogous way to the value-set analysis. The analogue of Theorem 1 for 
regions gives us that for a HB-region-race free program, at any location where a 
region R is accessed, the region-value-set computed by the analysis at that point 
will contain every sub-environment of R reachable at that point. 
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6 Translation to Classical Lock-Based Programs 


In this section we address the question of why an execution-preserving trans- 
lation to a classical lock-based program is not a fruitful route to take. In a 
nutshell, such a translation would not preserve races and would induce a sync- 
CFG with many unnecessary MSW edges, leading to much more imprecise facts 
than the analysis on the native sync-CFG described in the previous section. 
We also describe how our approach can be viewed as a lightweight translation 
of an interrupt-driven program to a classical lock-based one. The translation 
is “lightweight” in the sense that it does not attempt to preserve the execution 
semantics of the given interrupt-driven program, but instead preserves races and 
the sync-CFG structure of the original program. 


6.1 Execution-Preserving Lock Translation 


One could try to translate a given interrupt-driven program P into a classi- 
cal lock-based program PŁ in a way that preserves the interleaved execution 
semantics of P. By this we mean that every execution of P has a corresponding 
execution in P that follows essentially the same sequence of interleaved instruc- 
tions from the different threads (modulo of course the synchronization state- 
ments which may differ); and vice-versa. For example, to capture the semantics 
of disableint-enableint, one could introduce an “execution” lock E which is 
acquired in place of disabling interrupts, and released in place of enabling inter- 
rupts. Every instruction in a task thread outside a disableint-enableint block 
must also acquire and release Æ immediately before and after the instruction. 
Note that the latter step is necessary if we want to capture the fact that once 
a thread disables interrupts it cannot be preempted by any thread. Figure5a 
shows an interrupt-driven program P) and its lock translation PË in Fig. 5b. 
There are still issues with the translation related to re-entrancy of locks and it 
is not immediately clear how one would handle flag-based synchronization — but 
let us keep this aside for now. 

The first problem with this translation is that it does not preserve race infor- 
mation. Consider the program P; in Fig. 5a and its translation PË. The original 
program clearly has a race on x in statements 4 and 9. However the translation 
Př does not have a race as the accesses are protected by the lock E. Hence 
checking for races in P’ does not substitute for checking in P. An alternative 
around this would be to first construct P’ (recall that this is the version of P 
in which we introduce the skip-blocks around statements we want to check for 
races), then construct its lock translation (P’)’, and check this program for 
high-level races on the introduced skip-blocks. However this is expensive as it 
involves a 3x blow-up in going from P to P’ and another 3x blow-up in going 
from P’ to (P’)”. Further, checking for high-level races (for example using a 
lock-set analysis) is more expensive than just checking for races. In contrast, as 
we show next, our lock-set analysis on the native program P does not incur any 
of these expenses. 
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main: main: main: 
1. x := y := t := 0; 1. x := y := t := 0; 1. x:= y :=t := 0; 
2. create(ti); 2. spawn(t1); 2. spawn(t1); 
3. create(t2); 3. spawn(t2) ; 3. spawn(t2); 
ti: t2: ti: t2: ti: t2: 
4.x :=x+1; 8. disableint; 4. lock(E) 10. lock(E); 4.x :=x+1; 8. lock(A); 
5. disableint; 9. t := x; 5. x:=x+1; 1i; t i= °x; 5. lock(A); 9. t := x; 
6. x := y; 10. enableint; 6. unlock(E) 12. unlock(E); 6. x := y; 10. unlock(A) ; 
7. enableint; 7. lock(E) 7. unlock(A); 

8. x i= y; 

9. unlock(E) 


(a) Example program P, (b) Exec-preserving trans. Př (c) Lightweight trans. PY 


Fig. 5. Example program Pı, and its lock and lightweight translations P, PW. 


The second problem with a precise lock translation is that the sync-CFG of 
the translated program has many unnecessary MSW-edges, leading to impre- 
cision in the ensuing analysis. Consider the program P» in Fig.6, and its lock 
translation Pf in Fig. 7. Pə is similar to P, except that line 4 is now an increment 
of y instead of x, and the resulting program is race-free (in fact HB-race-free). 
Notice that the may-sync-with edges from line 13 to 4, and line 6 to 10 in the 
sync-CFG of PŽ in Fig.7 are unnecessary (they are not present in the native 
sync-CFG) and lead to imprecise facts in an interval analysis on this graph. Some 
of the final facts in an interval analysis on these graphs are shown alongside the 
programs in Figs.6 and 7. In particular the analysis on P is unable to prove 
the assertion in line 10 of the original program. 


6.2 A Lightweight Lock-Translation 


Our disjoint block-based approach of Sect.5 can be viewed as a lightweight lock 
translation which does not attempt to preserve execution semantics, but pre- 
serves disjoint blocks and hence also races and the sync-CFG structure of the 
original interrupt-driven program. 


main: 
eS yo eS te s= 05 
2 create(t1); 
2/3 create(t2); 


ZL N 
tl: ah t2: 
t=y=t=0 = A : HESL 
4 y := yt; 8 disableint; z = 
0< 2,y,t <1 5 disableint; \ / Of te, t= E 
6 x :=y; s 10 // assert (t<=1) 
O<a,y,t<1 7 enableint; »° ‘11 enableint; O<2,y,t<1 


Fig. 6. Program Pz with its Sync-CFG and facts from an interval analysis 
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main: 
Lx Sy += Es] 0; 
2 spawn(t1); 
“3 spawn (t2); 
# Fe 


t2: 


EAE A O< z,y,t 
4 lock(E); 10 lock(E); = 
O<a,t] 5 Vie yt. -7 Al t := x; 
I<yf 6 unlock(E) ;~ ‘S12 // assert (t<=1) 
O<a,y,t 7 lock(E);<--*-*-13 unlock(E); O<2,y,t 
8 x := y; ot 
0S. Sait 9 unlock(E) ;” 


Fig. 7. Lock translation P of P2, with its Sync-CFG and interval analysis facts 


Let us first spell out the translation. Let us fix an interrupt-driven program 
P = (V,T). The idea is simply to introduce a lock corresponding to each pattern 
of disjoint block pairs listed in Fig. 4, and to insert at the entry and exit to these 
blocks an acquire and release (respectively) of the corresponding lock. For each 
of the cases (a) through (h) we introduce locks named A through H, with some 
exceptions. Firstly, for case (f) regarding the create of a thread t, we simply 
translate these as a spawn(t) command in a classical lock-based programming 
language, which has a standard acquire-release semantics. Secondly, for case (h), 
we need a copy of H for each thread t, which we call H+. This is because the 
concerned blocks (say between a set and unset of the flag f) are not disjoint 
across task threads, but only with the “then” block of an ISR thread statement 
that checks if f = 0. The ISR thread now acquires the set of locks {H; | t € T} 
at the beginning of the “then” block of the if statement, and releases them at 
the end of that block. We call the resulting classical lock-based program PW. 
Figure 5c shows this translation for the program P}. 

Figure 8 shows this translation along with the sync-CFG edges and some of 
the final facts in an interval analysis for the program P>. 

It is not difficult to see that P™ allows all executions that are possible in P. 
However it also allows more: for example the execution of PW (Fig. 5c) in which 
thread t1 preempts t2 at line 9 to execute the statement at line 4, is not allowed 
in P,. Thus it only weakly captures the execution semantics of P. However, every 
race in P is also a race in PW. To see this, suppose we have a race on statements 
s and t in P. This means there is a high-level race on the two skip blocks around 
s and t in the augmented program P’. Since an execution exhibiting the high- 
level race on these blocks would also be present in (P’)“ which is identical to 
(PWY, it follows that the corresponding statements are racy in PW as well. 

Further, since our translation preserves disjoint blocks by construction, if s 
and t are in disjoint blocks in P, the corresponding statements will be in disjoint 
blocks in PW; and vice-versa. It follows that the sync-CFGs induced by P and 
P™ are essentially isomorphic (modulo the synchronization statements). As a 
result, any value-set-based analysis will produce identical results on the two 
graphs. 
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Finally, if statements s and t are HB-racy in P, they must also be HB-racy 
in PW. This is because disjoint blocks are preserved and the synchronizes-with 
relation is inherited from the disjoint blocks. Hence the execution witnessing the 
HB-race in P would also be present in PW, and would also witness a HB-race 
on the corresponding statements. 

We summarize these observations below: 


Proposition 1. Let P be an interrupt-driven program and PW the classical lock 
program obtained using our lightweight lock translation. Then: 


1. If statements s and t are racy in P, the corresponding statements are racy in 
PW as well. 

2. If statements s and t are HB-racy in P, the corresponding statements are 
HB-racy in PW as well. 

3. The sync-CFGs induced by P and PW are essentially isomorphic. As a result 
the final facts in a value-set-based analysis on these graphs will be identical. 


main: 


I 
=} 


1 x: y; t: 
2 spawn(t1); 
3 spawn(t2); 


K i 
z=y=t=0 sa A = Usmat 
4 y := ytl; 8 lock(A); = 
O<eyt<1 5 lock(A); \ 4% 9 t r= x: 
6 Xx fay A 10 // assert (t<=1) 
O<a,y,t<1 7 unlock(A); /° ‘11 unlock(A); O<a,y,t<1 


Fig. 8. Our lightweight translation PX of P2, with its Sync-CFG and interval analysis 
facts 


6.3 Lockset Analysis for Race Detection 


For classical lock-based programs, the lockset analysis [24] essentially tracks 
whether two statements are in disjoint blocks. Here two blocks are disjoint if 
they hold the same lock for the duration of the block. When two statements are 
in disjoint blocks, they are necessarily happens-before ordered, and hence this 
gives us a way to declare pairs of statements to be non-HB-racy. 

A lockset analysis computes the set of locks held at each program point as 
follows: at program entry it is assumed that no locks are held. When a call to 
acquire(/) is encountered, the analysis adds the lock l at the out point of the 
call. When a call to release(/) is encountered the lockset at the out point of the 
call is the lockset computed at the in point with the lock | removed. For any 
other statement, the lockset from the in point of the statement is copied to its 
out point. The join operation is the simple intersection of the input locksets. 
Once locksets are computed at each point, a pair of conflicting statements s and 
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t in different threads are declared to may HB-race if the locksets held at these 
points have no lock in common. 

Using our lock translation above, we can detect races as follows. Given an 
interrupt-driven program P, we first translate it to the lock-based program PW, 
and do a lockset analysis on PW. If any pair of conflicting statements s and t 
are found to be may-HB-racy in PW , we declare them to be may-HB-racy in P. 
By Proposition 1(2), it follows that this is a sound analysis for interrupt-driven 
programs. 


7 Analyzing the FreeRTOS Kernel Library 


We now perform an experimental evaluation of the proposed race detection algo- 
rithm and sync-CFG-based relational analysis for interrupt-driven programs. 
We use the FreeRTOS kernel library [3], on which our interrupt-driven pro- 
gram semantics are based, to perform our evaluation. FreeRTOS is a collection 
of functions mostly written in C, that an application developer compiles with 
and invokes in the application code. We view the FreeRTOS kernel library as an 
interrupt-driven program as follows: we build an interrupt-driven program out of 
the FreeRTOS kernel as shown in the 

figure alongside. The main thread is main: 

responsible for initializing the kernel data 
structures and then creating two threads: 
a task thread which branches out calling 
each task kernel API function, and loops 
on this; and an JSR thread which similarly 
branches and loops on the ISR kernel API 
functions. FreeRTOS provides versions of 
API functions that can be called from 
interrupt service routines. These functions 
have “FromISR” appended to their name. 
While it is sufficient to have one ISR 
thread, we assume (in the analysis) that 
there could be any number of task threads 
running. To achieve this we simply add sync-edges within each task kernel func- 
tion, in addition to the usual sync-edges between task functions. We used FreeR- 
TOS version 10.0.0 for our experiments. We conducted these experiments on an 
Intel Core i7 machine with 32GB RAM running Ubuntu 16.04. 


7.1 Race Detection 


We consider 49 task and queue API functions that can be called from an appli- 
cation (termed top-level functions) for race detection. The functions operating 
on semaphores and mutexes were not considered. 
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We prepared the API functions for analysis, in two steps: (1) inlining and 
(2) lock insertion, as follows: The function vTaskStartScheduler and the queue 
initialization code in the function xQueueGenericCreate were treated as part of 
the main thread, which initializes kernel data structures. All the helper function 
calls made inside the top-level functions were inlined. After inlining, the functions 
are modified to acquire and release locks using the strategy explained in Sect. 6.2. 
We consider each pair of disjoint blocks as taking the same distinct lock. For 
example, the pair of disjoint blocks protected by disableint-enableint take 
lock A. That is disableint is replaced with acquire(A) and enableint is 
replaced with release(A). A total of 9 locks corresponding to disjoint blocks 
were employed in the modification of the FreeRTOS code. The two steps outlined 
above are automated. Inlining is achieved using the inline pass in the CIL 
framework [22]. Lock insertion is accomplished using a script. 

The modified code, which has over 3.5K lines of code, is used for race detec- 
tion. We tracked 24 variables and check whether the statements accessing them 
are racy. These variables include fields in the queue data-structure, task con- 
trol block, and queue registry, as well as variables related to tasks. FreeRTOS 
maintains lists for the states of the tasks like “ready”, “suspended”, “waiting to 
send”, etc. The pointers to these lists are also analysed. Access to any portion 
of a list (like the delayed list) is treated as an access of a corresponding variable 
of the same name. 

Races are detected in this modified FreeRTOS code in three steps - (1) com- 
pute locks held, (2) identify whether access of a variable is a read or write, and 
(3) report potential races. First a lockset analysis, as explained in Sect. 6.3, to 
compute locks held at each access to variables, is implemented as a pass in CIL. 
The modified FreeRTOS code is analyzed using this new pass and the lockset at 
each access to the 24 variables of interest is computed. Then, a writes pass to 
identify whether accesses to variables are “read” or “write”, also implemented in 
CIL, is run on the modified FreeRTOS code. Finally, a shell script to interpret 
both the results in the previous steps and report potential races is employed. 
The script identifies the conflicting access pairs (using the writes pass) and the 
locks held by the conflicting accesses (using lockset pass). 

Our analysis reports 64 pairs of conflicting accesses as being potentially 
racy. On manual inspection we classified 18 of them are real races and the 
rest as false positives. Table2 summarizes our findings. The second column 
in the table lists the variables of interest involved in the race, like various 
task list pointers, queue registry fields pcQueueName and xHandle, task vari- 
able uxCurrentNumberOfTasks, tick count xTickCount, etc. The third column 
lists the functions in which the conflicting accesses are made and the fourth gives 
the number of racing pairs. The fifth column assesses the potential races based 
on our manual inspection of the code. The analysis took 3.91s. 

The false positives were typically due to the fact that we had abstracted 
data-structures (like the delayed list which is a linked-list) by a synonymous 
variable. Thus even if the accesses were to different parts of the structure (like 
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the container field of a list item and the next pointer of a different list item) our 
analysis flagged them as races. 

We were in touch with the developers of FreeRTOS regarding the 18 pairs 
we classified as true positives. The 14 races on the queue registry were deemed 
to be non-issues as the queue delete function is usually invoked only once the 
application is about to terminate. The 2 races on uxCurrentNumberOfTasks are 
known (going by comments in the code) but are considered benign as the variable 
is of “base type”. The remaining couple of races on the delayed task lists appear 
to be real issues as they have been fixed (independent of our work) in v10.1.1. 


7.2  Region-Based Relational Analysis 


Our aim here is to do a region-based interval and polyhedral analysis of a region- 
race-free subset of the FreeRTOS kernel APIs, and to prove some simple asser- 
tions about the kernel variables in each region. 

We first identified six regions for this purpose. One region corre- 
sponds to variables protected by disabling interrupts (like xTickCount, 
xNextTaskUnblockTime, etc.), while variables protected by suspend and resume 
scheduler commands (like uxPendedTicks, xPendingReadyList, etc.) are in 
another region. Fields of the queue structure like pcHead, pcTail, etc. are in 
a third region, while the waiting lists for a queue form another region. The 
queue registry fields like pcQueueName and xHand1le are in region 5. The pointer 
variable pxCurrentTCB, pointing to the current Task Control Block (TCB), is 
put in the sixth region. 

The FreeRTOS code was modified further to reflect access to regions. For 
this new variables R,,..., Rg, are declared. Wherever there is a write (or read) 
access to a variable in region ¿į an assignment statement that defines (or reads 
from) variable R; is inserted just before the access. This is done using a script 
which takes the result of the writes pass to find where in the source code an 
appropriate assignment statement has to be inserted. We selected 15 APIs that 
did not contain any region races. 

Next, we prepared the API functions for the analysis in two steps. They are 
described below: 


Abstraction of FreeRTOS API Functions. We abstracted the FreeRTOS source 
code to prepare it for the relational analysis. In this abstraction, we basically 
model the various lists (ready list, delayed list) by their lengths and the value at 
the head of the list (if required). Using this abstraction, we are able to convert 
list operations to operations on integers. 

Similarly, to model insertion into a list, we abstract it by incrementing the 
variable which represents the length of the list. We abstracted all the API func- 
tions in a similar fashion. 


Creation of the Sync-CFG. The next step is to create a sync-CFG out of the 
abstracted program. For doing this, we used the abstracted version of the FreeR- 
TOS code (along with acquire-release added as explained in Sect. 7.1). 
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Table 2. Potential races 


Variables Functions ##Race pairs Remark 
pxDelayedTaskList eTaskGetState 1 Real race. Read of 
xTaskIncrementTick pxDelayedTaskList in 


eTaskGetState while it is written 
to in xTaskIncrementTick 


pxOverflowDelayedTaskList |eTaskGetState 1 Real race. (similar as above) 
xTaskIncrementTick 

uxCurrentNumberOfTasks xTaskCreate 2 Real race. Unprotected read in 
uxTaskGetNumber0fTasks uxTaskGetNumberOfTasks while it is 

written to in xTaskCreate 

pcQueueName vQueueDelete 14 Real race. Unprotected accesses in 

xHandle pcQueueGetName queue registry functions 
vQueueAddToRegistry 

xTasksWaitingToSend eTaskGetState 2 False positive. Initialization of 

xTasksWaitingToReceive xQueueGenericReset vars when queue is created 

pxDelayedTaskList 9 functions like 11 False positive. Initialization of 

pxOverflowDelayedTaskList |xTaskCreate, vars when the first task is created 

xSuspendedTaskList eTaskGetState, etc. 

pxCurrentTCB 

pxDelayedTaskList 13 functions like 33 False positive. The accesses are to 

pxOverflowDelayedTaskList |vTaskDelay, disjoint portions of the lists 

xSuspendedTaskList eTaskGetState, etc. 

xTasksWaitingToSend 


xTasksWaitingToReceive 


Next, we used a script to insert non-deterministic gotos from the point of 
release of a lock to the acquire of the same lock. Since we are using gotos for 
creation of sync-CFG, we keep all the API functions in main itself and evaluate 
a non-deterministic “if” condition before entering the code for an API function. 


Results. For the purpose of analysis we listed out some numerical relations 
between kernel variables in the same region, which we believed should hold. 
We identified a total of 15 invariants including 4 invariants which involve rela- 
tions between kernel variables. We then inserted assertions for these invariants 
at the key points in our source code like the exit of a block protecting a region. 

We have implemented an interval-based value-set analysis and a region-based 
octagon and polyhedral analysis for C programs using CIL [22] as the front-end 
and the Apron library (version 0.9.11) [16]. We represent the sync-with edges of 
the sync-CFG of a program using goto statements from the source (release) to 
the target (acquire) of the may-synchronizes-with (MSW) edges. 

We ran our implementation on the abstracted version of the FreeRTOS kernel 
library, with the aim of checking how many of the invariants it was able to prove. 
The abstracted code along with addition of gotos is about 1500 lines of code. 
We did a preliminary interval analysis on this abstracted sync-CFG and were 
able to prove 11 out of these 15 invariants. With a widening threshold of 30, 
the interval analysis takes under 5 min to run. As expected, the interval analysis 
could not prove the relational invariants. 
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We then did a region-based polyhedral analysis using the six regions identified 
above. For the region-based analysis, we used convex polyhedra domain with a 
widening threshold of 30. It is able to prove all the assertions we believed to be 
true. The analysis takes about 30min to complete with the convex polyhedra 
domain and about 20min with the octagon domain. 

The results obtained by our analysis are shown in Table 3. 


Table 3. Relational analysis results 


Assertion Interval Anal | Region Anal 

(Oct /Polyhedral) 
xTickCount < xNextTaskUnblockTime No Yes 
head(pxDelayedTaskList) = xNextTaskUnblockTime No Yes 
head(pxDelayedTaskList) > TickCount No Yes 
uxMessagesWaiting < uxLength No Yes 
uxMessagesWaiting > 0 Yes Yes 
uxCurrentNumberOfTasks > 0 Yes Yes 
lenpxReadyTasksLists > 0 Yes Yes 
uxTopReadyPriority > 0 Yes Yes 
lenpxDelayedTaskList > 0 Yes Yes 
lenxPendingReadyList > 0 Yes Yes 
lenxSuspendedTaskList > 0 Yes Yes 
cRxLock > —1 Yes Yes 
cTxLock > —1 Yes Yes 
lenxTasksWaitingToSend > 0 Yes Yes 
lenx'TasksWaitingToReceive > 0 Yes Yes 


8 Related Work 
We classify related work based on the main topics touched upon in this paper. 


Data Races. Adve and Hill [1] introduce the notion of a data race using a 
happens-before relation, and identify instructions that form release-acquire pairs, 
for low-level concurrent programs. Boehm and Adve [4] define races in terms of 
consecutive occurrences in a sequentially consistent execution, as well as using 
a happens-before order, in the context of the C++ semantics. They show their 
notions are equivalent as far as race-free programs go. As pointed out earlier, 
the definition of races as consecutive occurrences is inadequate in our setting. 
Schwarz et al. [26] define a notion of data race for priority-based interrupt-driven 
programs, where there is a single main task and multiple ISRs. A race occurs 
when the main thread is accessing a variable at a certain dynamic priority, and an 
ISR thread with higher priority also accesses the variable. Our definition can be 
seen to be stronger and more accurately captures racy situations. In particular, 


720 N. Chopra et al. 


if the ISR thread with higher priority does not actually execute the conflicting 
access, due to say a condition not being enabled, then we would not call it a 
race. The term “high-level” race was coined by Artho et al. [2]. Our definition 
of a high-level race follows that of [20]. 


Analysis of Interrupt-Driven Programs. Regehr and Cooprider [23] describe a 
source-to-source translation of an interrupt-driven program to a standard multi- 
threaded program, and analyze the translated program for races. Their trans- 
lation is inadequate for our setting in many ways: in particular, disable-enable 
of interrupts is translated by acquiring and releasing all ISR-specific locks; how- 
ever this does not prevent interaction with another task while one task has 
disabled interrupts. In [8] they also describe an analysis framework for constant- 
propagation analysis on TinyOS applications. They use a similar idea of adding 
“control-flow” edges between disable-enable blocks and ISRs. However no sound- 
ness argument is given, and other kinds of blocks (suspend/resume, flag-based 
synchronization) are not handled. The works in [5,6,13] analyze timing prop- 
erties, interrupt-latency, and stack sizes for interrupt-driven programs, using 
model-checking, algebraic, and algorithmic approaches. Schwarz et al. [25,26] 
give analyses for race-detection and invariants based on linear-equalities for their 
aforementioned class of priority-based interrupt-driven programs. Our work dif- 
fers in several ways: Their analysis is directed towards applications (we target 
libraries where task priorities do not matter), their analyses are specific (we 
provide a basis for carrying out a variety of value-set and relational analyses, 
targeting race-free programs), they consider priority and flag-based synchroniza- 
tion (but not disable-enable and suspend-resume based synchronization). Sung 
and others [27] consider interrupt-driven applications in the form of ISRs with 
different priorities, and perform interval-based static analysis for checking asser- 
tions. They do not handle libraries and do not leverage race-freedom. Finally, 
[20] uses a model-checking approach to find all high-level races in FreeRTOS 
with a completeness guarantee. 


Analysis of Race-Free Programs. Chugh et al. [7] use race information to do 
thread-modular null-dereference analysis, by killing facts at a point whenever a 
notional read of a variable is found to be racy. De et al. [11] propose the sync- 
CFG and value-set analysis for race-free programs, while Mukherjee et al. [21] 
extend the framework to region and relational analyses. Gotsman et al. [12] and 
Miné et al. [18,19] define relational shape/value analyses for concurrent programs 
that exploit race-freedom and lock invariants respectively. All these works are for 
classical lock-based synchronization while we target interrupt-driven programs. 


9 Conclusion 


In this paper our aim has been to give efficient static analyses for classes of 
non-standard concurrent programs like interrupt-driven kernels, that exploit the 
property of race-freedom. Towards this goal, we have proposed a definition of 
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data races which we feel is applicable to general concurrent programs. We have 
also proposed a general principle for defining synchronizes-with edges, which is 
the key ingredient of a happens-before relation, based on the notion of disjoint 
blocks. We have implemented our theory to perform sound and effective static 
analysis for race-detection and invariant inference, on the popular real-time ker- 
nel FreeRTOS. 

We feel this framework should be applicable to other kinds of concurrent 
systems, like other embedded kernels (for example TI-RTOS [14]) and appli- 
cation programs, and event-driven programs. There are additional challenges in 
these systems like priority-based preemption and priority inheritance conventions 
which need to be addressed. Apart from investigating these systems we would 
like to apply this theory to perform other static analyses like null-dereference, 
points-to, and shape analysis, for these non-standard classes of concurrent 
programs. 
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Abstract. We present an abstract domain able to infer invariants on 
programs manipulating trees. Trees considered in the article are defined 
over a finite alphabet and can contain unbounded numeric values at their 
leaves. Our domain can infer the possible shapes of the tree values of each 
variable and find numeric relations between: the values at the leaves as 
well as the size and depth of the tree values of different variables. The 
abstract domain is described as a product of (1) a symbolic domain based 
on a tree automata representation and (2) a numerical domain lifted, for 
the occasion, to describe numerical maps with potentially infinite and 
heterogeneous definition set. In addition to abstract set operations and 
widening we define concrete and abstract transformers on these environ- 
ments. We present possible applications, such as the ability to describe 
memory zones, or track symbolic equalities between program variables. 
We implemented our domain in a static analysis platform and present 
preliminary results analyzing a tree-manipulating toy-language. 


1 Introduction 


The abstract interpretation framework [5] enables the development of sound 
static analyzers by inferring and proving invariants on reachable states of pro- 
grams. Invariants in the scope of abstract interpretation are elements of a lattice 
called an abstract domain. Most domains focus on numeric or pointer variables. 
By contrast, we propose an abstract domain for variables whose values are tree 
data-structures. Tree values appear natively in some languages (such as OCaml) 
and applications (such as the DOM in web programming) or can be encoded 
through pointer manipulations (as in C). Trees can abstract terms in logic pro- 
gramming. A tree domain can also be useful to collect symbolic expressions 
appearing in a program. 
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float golden_ratio(int n) { 


int i = 0; 
float r = 1; 
typedef struct node while (i < n) { 
rei+i1/ fr; 
int data; i += 1; 
struct node* next; } 
} node; return r; 
} 
node* append(node* head, int data) goete 
E Program 2: Golden ratio in C 
if (head==NULL) { 
return (create(data, NULL)); 
} else { let rec fxn= 
node *cursor=head; match n with 
while(cursor->next != NULL) 0 -> [] 
cursor=cursor->next; a => (t1) or Cee) Ck a Gi=1)) 
node* new_node=create(data,NULL); 
cursor ->next=new_node; let () = 
return head; (*Assume x:int and n:int>=0*) 
} et t = f x n in 
} match t with 
ee H => €) 
Program 1: Append to list in C Pp i: q when p> x > O 
_ -> assert false 


Program 3: List type in OCaml 


Used Memory Zones. Program 1 describes an append function defined in the C 
language, this function adds an integer at the end of a linked list. The infinite 
set of unbounded terms of the form *(*( ...* (head + 4) ...+ 4) + 4) represents 
memory zones that are used by the append function. Our analyzer is able to infer 
and represent such sets of terms. This provides the information that Program 1 
does not use any of the data field of the linked list. Such a function would be 
fairly commonly called in a real-life project. In a classical top-down static analy- 
sis by abstract interpretation, function calls are inlined at each call site. A way to 
improve scalability is to design modular analyzers able to reuse previous analysis 
results (as emphasized in [7]). In order to be able to successfully reuse function 
body analysis, input states must be unified. Moreover the cost of performing the 
analysis of the body of functions grows with the number of variables that need to 
be tracked. A common way to deal with both problems is to use framing on the 
inputs of the functions (as in separation logic [25]). This improves (1) precision: 
as we know that they are not modified by the function call, (2) body analysis effi- 
ciency: as the input state is reduced and finally (3) modularity: as constraints on 
the usage of the first analysis are relaxed by the removal of constraints. 


Symbolic Relations. Program 2 is a C function computing an apriramannn of 
the golden ration (as it is the limit of the sequence ro = 1, rn41 = 1+3 5): As 
classical numerical domains can not represent such numerical relations, methods 
were proposed to track symbolic equality between expressions (see [23)). However 
such methods can not handle the unbounded iteration of Program 2. The set of 
reachable states at the end of Program 2 can be expressed by r = 1 + 1/(1 + 
1/...1...) with depth n. Please note that to infer such results we need to express 
numerical relations between the size of trees and the numeric variables from the 
program. 
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Numerical Environment. Consider now the OCaml Program 3, we want to prove 
that the assert false expression is never reached. This program builds a list 
of size 2*n with alternating values x+ 1 and x—1. The assertion states that the 
head of the list is x+ 1. After the definition of t there are two types of reachable 
states. (1) Those that have not gone through the loop (t + [],x > Z,n+ 0), 
and (2) those that have gone through at least one iteration of the loop: (t > 
[a1 ;a2;a3; ...],£ > a,n > 0,a1 > a + 1,a2 |> a — 1,a3 + a + 1), where 
a € Z. Therefore we need to be able to keep numerical relations between the 
parametric and unbounded number of numeric values appearing in t and numeric 
variables from the program. Classical numeric domains do not provide out-of- 
the-box abstractions for sets of partially defined numerical functions, therefore 
we define such an abstraction. As an example of analysis result, the memory 
representation obtained by our analysis for t describes the set of trees of the 
form: Cons (a, Cons(b, Cons(a, ..., Nil) ...)) where a = x + 1 and b = 
x— 1. Therefore we are able to prove that the assert false expression is never 
reached. 


Contributions. The main contributions of the article are threefold: (1) The exten- 
sion of results on tree automata to the abstract interpretation framework by 
definition of a widening operator, in order to represent the set of tree shapes 
that a variable can contain. (2) The definition of a numerical domain built upon 
classical abstract domains able to represent sets of partial numerical maps with 
heterogeneous and unbounded definition sets. This is necessary to represent the 
numeric values at the leaves of a set of trees, as trees are unbounded and can 
contain a different number of leaves. (3) The definition of a novel abstraction 
for trees that can contain numerical values at their leaves. This last domain 
combines the abstractions (1) and (2). Moreover it is relational as it can express 
relations between numerical values found in trees and in the rest of the program, 
and relations between trees. Finally all results were implemented in an existing 
framework and experimented on a toy-language. 


Limitations. At this point, analyses can only be performed on the toy language 
presented thereinafter, not on real life code, therefore we do not present any 
benchmark results, even though examples of analysis results will be put forth. 
Indeed Programs 1, 2 and 3 were precisely analyzed once encoded into our toy- 
language (see Programs 4 and 5). 


Outline. We start, in Sect. 2, by presenting the concrete semantic we want to 
abstract. In Sect. 3 we build a first abstraction which forgets numerical values and 
focuses on abstracting tree shapes. Section 4 presents a novel numerical abstract 
domain required for the definition of the abstract domain of Sect. 5, which aims 
at precisely representing numerical constraints between trees and program vari- 
ables. In Sect.6 we provide remarks on the implementation and results of the 
analyzer. Finally Sect. 7 mentions related works while Sect. 8 concludes. 
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Notations. Classical Galois connections (see [5]) are denoted (A, C4) = 
(B,Cs). When no best abstraction can be defined, we use the representation 
framework (as defined by Bourdoncle in [3], also known as concretization only 
framework), representations are denoted by (A, C4) 2 (B, Cpg). A » B denotes 
the set of partial maps from A to B, and àjax.f(x) € B denotes the map in 
A — B that associates f(x) to x. Finally when f € A — C and g E€ B > C, 
with AN B = @, f w g is the function defined on AU B, that associates f(x) 
(resp. g(x)) to x whenever x € A (resp. x € B). 


2 Syntax and Concrete Semantics 


Definition 1. An alphabet F is a finite set, a ranked alphabet is a pair R = 
(F,a) where F is an alphabet and a € F — N. For f € F, we call arity of f 
the value a( f). We assume that Z and F are disjoint and we define the set of 
natural terms over R (denoted Tz(R)) to be the smallest set defined by: 


- ZC T(R) 
~ Vp = 0, J GF Uy -- -tp E€ TZ(R), alf) =p > f(ti,...,tp) € Tz(R) 


Moreover when R contains at least one symbol of arity 0, we define terms over 
R (denoted T(R)) to be the smallest set defined by: 


- Vp 2 0, JCF iy single GDR) alf) =p => f(t,---,tp) E T(R) 


In the following, Fn denotes the subset of F of arity n. Moreover given a term 
t € T(R) we denote f = head(t) € F and sons(t) a possibly empty tuple 
(t1,...,tn) of elements of T(R) such that t = f(ti,...,tn). 


Remark 1. Numerical leaves are defined to contain integers, however this could 
be modified to rationals, real numbers or floats. We are parametric in the type 
of numeric values, as they are delegated to an underlying numerical domain. 


Example 1. Consider the ranked alphabet R = {*(1),&(1),+(2),x(0)}, u(n) 
means that symbol u has arity n. Then &x € T(R), but *(&x+4) € Tz(R), 
and *(&x+4) ¢ T(R). Using this alphabet we can model C pointer arithmetic. 


Example 2. U = {+(a,y) | a < y} and V = {+(2,4+(z,y)) |£ <yAz< y} are 
two sets of natural terms over R = {+(2)} which we use as running examples. 
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tree-expr 2 make-symbolic(F, 
A 
tree-expr,..., tree-expr) sym-exrpr = | get_sym_head(tree-expr) 
make_integer(ezpr) expr =... 
get_son(lree-expr, expr) | get_num_head(trec-expr) 
stmt =... | is_symbol (iree-expr) 
T = tree-expr | sym-expr == 


Fig. 1. Syntax extension of the language 


i[make_symbolic(s € Fm,T1,...,Tm)](E, F) = {s(ti,...,tm) | Vi, ti € E[Ti](£, F)} 
i[make_integer(e € expr)](E, F) = Ele] (E, F) 

iis_-symbol(T)](E, F) = {true | 3t € E[T](Z, F), 3f E R, t= f(...)} 

U {false | 3t € E[T](F, F), t € Z} 

i[get-son(T, e)](E, F) = {t | Ji € EJee] (E, F), t € EJT] (E, F), f € Fm>i, 

t = f(to,...,tm—1) Ati =t} 

[get _num_head(T)](£,F) = {i € Z| 3t € E[T](£, F), t = i} 
[get_symhead(T)](£,F) = {s E€ R | 3t € EJT] (E, F), t= s(...)} 


& 


& 


Fig. 2. Concrete operations on natural terms 


int n; int i; int x; int rep; 


int i; tree t; 
int n; assume (n>=0) ; 
tree y; i = 0; 
assume (n >= 0); t = make_symbolic("Nil",{}); 
i = 0; while (i < n) { 
y = make_symbolic("p",{}); t = make_symbolic("Cons", 
while (i < n) { {make_integer (x-1), t}); 
y = make_symbolic("*", t = make_symbolic("Cons", 
{make_symbolic("+", {make_integer(x+1), t}); 
ty, i= a + 43 
make_integer (4) Fs 
3) if (get_sym_head(t) != "Nil") { 
H); rep = get_num_head(get_son(t,0)); 
i = iti; assert(rep > x); 
} } 
Program 4: *(p+4) iterated Program 5: List manipulation 


Syntax of the Language and Concrete Operations. We assume already defined 
a small imperative language and extend it (in Fig.1) with statements, tree 
expressions (tree-expr) which are expressions that are evaluated to trees, and 
simple symbol expressions (syrm-expr) which enable the manipulation of sym- 
bols. We add the ability to build a tree which contains only a numerical leaf: 
make_integer(e), the ability to read the i-th son of a tree t: get_son(t, i), .... 
Figure 2 defines concrete operations over the set o(Tz(R)). Figure2 assumes 
given a set of program numerical variables V, a set of numerical expressions 
(over V) denoted expr, a set of statements stmt, a notion of numerical environ- 
ment E € € = V — Z, a set of tree program variables 7, a notion of tree 
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environment F € § = T — 9(Tz(R)), D = E x F is our concrete domain. 
Finally we assume already partially defined on numerical expressions an eval- 
uation function Efe € ezpr|(E € V > Z,F € T > p(Tz(R))) € (Z). Using 
this operator we are able to define Program 4 which computes the memory zones 
used by append from Program 1, and Program 5 that simulates the behavior of 
Program 3. 


3 Natural Term Abstraction by Tree Automata 


In this section we start by defining a value abstraction for tree sets (in Sect. 3.1), 
which is then lifted to an environment abstraction (in Sect. 3.2). 


3.1 Value Abstraction 


As a first abstraction for natural terms, we put aside numerical values and define 
an abstraction able to describe sets of tree shapes. Tree automata enable the 
description of set of terms built upon a finite ranked alphabet. The ranked 
alphabet of the language we want to analyze is extend with the O symbol to 
denote potential positions of numerical values. 


Definition 2 (Finite tree automata). A finite tree automaton (FTA) over 
a ranked alphabet R is a tuple (Q, R, Qf, ô), where Q is a (finite) set of states, 
Qs C Q is the set of final states, and d € (Unen Fn X Q” x Q) is the set 
of transitions. We define 6 : (Unen Fn X Q”) > o(Q) by: O(f,¢7) = {g | 
(f, 7,q') E8}. When 6 is such that, Yn EN, f€ Fn, T E Q”, E(f, 7)| = 1, 
we say that the automaton is complete and deterministic (CDFTA). We then 
abuse notations and denote by 6(f, q ) the unique element in the set 6(f, q). 


Definition 3 (Reachability). Given a FTA A = (Q,R,Q;,5) we define, a 
reachability function REACH, : T(R) > p(Q) 


REACH g(t) =let t),...,t, = sons(t) in 


J 5(head(t), (q1,---+4n)) 


(q1,---;qn)E (REACH a (t1),...,REACH 4 (tn )) 


If sons(t) is the empty tuple (which is the case when t is a constant a), the union 
is made over a unique element (which is the empty tuple), which then boils down 
to: 6(a,()). If sons(t) is not the empty tuple and for some i, REACH ,(t;) is 
empty, then REACH a(t) is also empty. 


Example 3. Consider the ranked alphabet R = { f(2),a(0)}, and the automaton 
A = ({u,v},R, {v}, {al) > u, f(v,v) > v, f(u,u) > u, f(u,u) > v}). Then 
REACH g(a) = {u}, REACH ,(f(a,a)) = {u,v}, REACH 4(f(f(a, a), a)) = {u, v}. 
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Definition 4 (Acceptance). Given a FTA A = (Q, i al 7 r a term t, we 
say that t is accepted by the automaton if REACHA(t) N Qf # 0. L(A) denotes 
the set of terms accepted by automaton A. 


Example 4. With the definition of Example 3, L(A) is the set of terms over R 
that contain at least one f. 


Definition 5 (Tree regular languages). A set of terms T over a ranked 
alphabet R is called tree regular if there exists a FTA A over R such that 
L(A) =T. The set of such languages is denoted TReg(R). 


Remark 2. As for regular languages, for all A € FTA there exists A’ € CDFTA 
such that L(A) = £(A’), moreover A’ is computable (see [4]). 


Example 5. — As proved in Example 4 the set of all terms over {f(2), a(0)} that 
contain at least one f is tree regular. 

— Consider now the ranked alphabet {a(1), b(1), e(0)} and the set of terms T = 
{e, a(b(€)), a(a(b(b(e)))),... }. We can prove (in a similar way as for a”b” in 
regular languages) that T is not tree regular. 

— On every ranked alphabet R: every finite language, the empty language and 
T(R) are tree regular. 


Proposition 1. (TReg(R), C,9,U,-°,0,T(R)) is a complemented lattice with 
infinite height, moreover it is not complete. C,N,U and complementation (°) 
are computable operations on tree automata [4]. 


We denote by R- the ranked alphabet R after adding the symbol O of arity 
0 (we assume that O ¢ R). Given a natural term t, we define t= to be the term 
obtained by replacing every integer with the O symbol. 


Proposition 2. (¢(Tz(R)),©) & (TReg(R™),C) where y(A) = {t | tH € 
L(A)} is a representation. Moreover with such a y definition, U, N soundly 
represent the union and the intersection. 


Remark 3. We only have a representation and not a Galois connection as lan- 
guage 7 of Example 5 does not have a best tree regular over approximation. 


Example 6. Let R = {+(2)} and A = ({0,1},R-, {0,1}, {((40 —> 0,+(0,0) —> 
1,+(0,1) — 1)}). Examples of terms recognized by A are shown on Fig. 3. 
Natural terms from our running example U and V (defined in Example 2) are 
also contained in y(A). Moreover as we do not provide numerical constraints: 
1+ (3 + 4), 23, 1+ (2 + (3 + 4)) are also elements in y(A). 


Due to the infinite height of the lattice, a widening operator is required. In 
the following, we assume given a constant w € N, this constant will be used 
to stabilize increasing chains, the greater the constant, the more precise our 
widening operator will be. 
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Definition 6. Let A = (Q,R,Q;,6) € FTA, and ~ be an equivalence relation 
on Q, such thatp~qApe Qe >q Ee Qs. We define AJ ~= (Q/ ~, R,Qf/ ~, 
Ueta, aes ts dp 5--+39%,9~)}) where q~ is the equivalence class of q in ~. 


Proposition 3. For every A € FTA and every ~ equivalence relation on its 


states, L(A) C L(A/ ~). 


Therefore following the idea from [9] and in [11], we define a widening opera- 
tion by quotienting states of automata by an equivalence relation of finite index. 
We define by induction a special sequence of equivalence relations on states 
of tree automata: ~;= {Q;,Q \ Qs} and ~z41 is ~p where we split equiv- 
alence classes not satisfying the following condition: Vf € Fn, Vpi,..-,pn E 
Q, War,---19n E Q, (Aja Pi ~k qi) => OCF, Pi,- Pn) ~k O(f,G1,---yGn) and 
Vq E€ Qf, q~ C Qg. This sequence of equivalence relations is the Myhill-Nerode 
sequence (see [4]). This sequence is of length at most the number of states of the 
automaton (before stabilization). Let d(w) = max{i < |Q| | index of ~;< w} 
(given an integer w, ¢ yields the index of the most precise of the equivalence 
relationships in the Myhill-Nerode sequence, that contains at most w equiva- 
lence classes) and [A], = A/ ~gw). [A]w is therefore a FTA with at most w 
states such that L(A) C L([A]w). As for regular languages, for every CDFTA a 
equivalent minimal CDFTA (in the sense of the number of states, and unique 
modulo state renaming) can be obtained by quotienting the automaton by ~)q). 
Therefore we define a widening operator on CDF TAs, which is then lifted to tree 
regular languages. 


Definition 7 (Widening operator V). AVA! = [AU Au. 
Proposition 4. This widening is sound and stabilizes infinite sequences. 


Remark 4. Consider the two following complete and deterministic tree auto- 
mata: A = ({a,b,h}, {+(2)}, {a}, {00 — 6,+(b,b) — a}) and B = ({a, b,c, h}, 
{+(2)}, {a}, {00 — 6,+(b,b) — ¢,+(b,c) a}) (unmentioned transitions 
go to h). A (resp. B) recognizes the tree +(0,0) (resp. +(0,+(0,0))), it 
over-approximates U (resp. V) from our running example. AU B is recognized 
by the following complete and deterministic tree automaton: C = ({a,b,c, h}, 
{+(2)}, {a,c}, {G0 — 6,+(b,6) — c,+(b,c) — a}). If we want to widen 
A and B with parameter 3, the following equivalence relation is computed: 
{{h}, {b}, {a,c}}. Merging equivalent states produces ({a,b, h}, {+(2)}, {a}, 
() ,+(b,b) — a,+(b,a) — a}), which contains a loop and over- 
approximates the union. 
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3.2 Environment Abstraction 


Now that we are given an abstraction for nat- 


ural term sets, let us show how this is lifted + + + 

to a notion of abstract natural term environ- I\ / i A 
ments mapping variables to natural terms. Given /\ \ 
a set of natural term variables 7, consider ¥* = it. 
(T — TReg(R-)) U {L} and the set operators /\ 


defined by the point-wise lifting of operators on 
TReg(R-). We also lift the concretization func- 
tion p(Tz(R)) — TReg(R™) to § — 3. We Fig.3. Example of accepted 
assume given an abstract numerical environment trees from Example 6 

FË and an abstract evaluator E[e]*. Abstract 

transformers [make_symbolic]*, [is-symbol]*, [get-son(e)]*, [get_sym head]* 
and [get num head]? are simple tree automata operations. For concision Fig. 4 
only provides definitions of two of these operators. Please note that these def- 
initions require all states of the automata to be reachable. An example of use 
of the is_symbol operator can be found in Example 7. Other abstract operators 
are similar. 


i! [make integer(e € expr)](E", F*) = Ha}, R, {a}, {O() > a}) 
E*[get_son(T, e € expr)](E", F*) = 
U (Q, R, {4q € Q | Jp = Qf, Js(po, ..., Pm—1) >pE ô A^ pi = q}, ô) 


(Q,R,Q,,5)€E*[T](E* ,F*) 
icEË [e](E*)n{0,...,.m—1} 


Fig. 4. Abstract operators 


Example 7. Consider the tree automaton A of Example6, (Fig.3), with 
Ft = (x =œ A): [get_sym_head(zx)]#(E#, F*) = {+} and [get num head(x)]*(E?, 
FÌ =T. 


4 Numerical Abstractions 


As emphasized in the introductory example, we rely on numerical domains to 
introduce constraints on numerical variables found in trees. In a classical numeric 
abstraction (e.g. intervals [6], octagons [22], polyhedra [8], ...), each abstract 
element represents a set of maps V — R for a fixed, finite set of variables 
Y. In contrast, our numeric variables are leaves of a possibly infinite set of 
trees of unbounded size. Hence before starting the presentation of the numerical 
abstraction for natural terms, we show how to extend in a generic way an abstract 
element in two steps. Firstly we want to be able to represent a set of maps, where 
each map is defined over a (possibly different) finite subset of an infinite set of 
variables (this is done in Sect. 4.1). Secondly, we use summarization variables to 
relax the finiteness constraint, so as to represent sets of maps over heterogeneous 
maps over infinitely many variables (done in Sect. 4.2). 
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4.1 Heterogeneous Support 


We define M = p(V » R), the set of partial maps from V, to R. M is ordered 
by the inclusion relation C. In the following def (f) denotes the definition set of 


S 

f. We assume defined a representation (p(S > R), C) && (Ns, EŞ), for every 
finite set S C V (such as octagons in |S| dimensions). Ns comes with the usual 
abstract set operator NS, UŞ. Moreover if x € S, y ¢ S, S’ is another finite set 
and NË € Ns then N#[x + y] € Nsufy}\{x} is the abstract element obtained by 
renaming x into y, N, 2 € Ns is obtained by existentially quantifying dimensions 
associated to elements in S and not in S’ and adding unconstrained dimensions 
for elements in S’ and not in S. From now on we assume that this last operator 
is exact (as for intervals, octagons, polyhedra over R). However results from this 
section can be extended to numerical domains that are able, given NË € Ns, 
NY” € Nog, to check if (N?) C yẹ (N*)\5. The precision of the extension 
defined in this subsection would then depend upon the precision of this test in 
the underlying domain. Finally [.] (resp. [-J§*) refers to the classical concrete 
(resp. abstract) semantic of operators on sets of numerical maps (resp. abstract 
elements). A classical method for the abstraction of heterogeneous maps is the 
use of a partitioning of the concrete element according to the definition set of its 
represented maps. However partitioning induces an increase in numerical oper- 
ation cost (exponential in the number of variable) which we would like to avoid. 
Therefore in order to abstract sets of maps with heterogeneous definition sets, 
we start by abstracting the potential definition set. We choose a simple lower- 
bound/upper-bound abstraction (l and u in the following definition). Moreover 
we need to abstract the potential mappings given a definition set: this is done 
using a classical numerical domain. Contrary to partitioning, we will use only 
one numerical abstract element, defined on the upper-bound u, to represent all 
environments (instead of one abstract element by definition set). We also add a 
T element, used in the case where the upper bound u is infinite. 


Definition 8 (Numerical abstraction). Let us define the following set: WË = 
{(N#,1,u) | l,w€ @(V)AL and u are finiteAl C uAN* € Nun NË A L¥}ULT, L}. 
An element of MË is therefore: either T, L or a triple (N*,l,u) where l and u 
are finite sets of variables such that NË is defined over u. 


Definition 9 (Concretization function). Abstract elements from MË are 
mapped to M thanks to the following concretization function: y(L) = 9, y(T) = 
Mm and Nt Lu) = {pS > Z|UCS Curperé(N¥je)} 


Example 8. As an example consider y(({x = y,x < 3,z = 0}, {x}, {z, y, z})) = 
{(x = a) |a <3}U{(x = a,y = a) |a < 3}U{(x = a,z |= 0) | a < 3}U{(£ => 
a ym a,z |> 0) | a < 3}. As intended, the resulting set of maps contains maps 
with different definition sets. 
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Definition 10 (Order). On MË we define the following comparison operator: 
(Ni iu) CUNY?) er ClCucw AN CE NE, 
trivially extended to T (resp. L) as being the biggest T smallest) element in 
MË. In the following mË denotes the subset of WË where u = p extended with T 
and L. 


this comparison is 


Proposition 5. y is monotonic for LE. 


Figure 5 provides the definition of the concrete and abstract semantics of the 
classical numerical statements, Assume and Assign (denoted x — e). We denote 
vars(e) the set of variables appearing in e. We recall that [Assume(c)]3(E € 
P(S > R)) = {f € E | true € E[c](f)} and [x — e] (E € (S > R)) = 
{fle e] | f E E^e € Efe](f)}. In order to ease the lifting of these classi- 
cal operators we define [stmt]o(M € M) = Us fnitecv[stmt]$ (M N (S > R)), 
for every statement stmt. Moreover we assume the existence of the following 
abstract operators: [Assume(c)]}'“(N#) and [e — e]}“N# abstracting soundly 
their respective concrete transformers. Note that the concrete semantic of 
Assume(c) (resp. x — e) enforces that maps are defined at least on the vari- 
ables appearing in c (resp. in e and on x). Abstract operators from Fig.5 are 
sound with respect to y and their concrete operators. 


[Assume(c 


))(M) = [Assume(c)]o({f | f E€ M A vars(c) € def(f)}) 
[Assume(c)]?((N*, 1, u)) = ([Assume(c)]}"(N*), lU vars(c), u) 
[x < e](M) = [x « elof | f E MA vars(e) U {x} € def(f)}) 
[x + e]*((N?, 1, u)) = (x + ef? (.N*), LU vars(e) U {x}, u) 


Fig. 5. Concrete and abstract semantic of usual numerical operators 


We now need to define U that abstracts the classic set operator U. We can not 
directly apply the corresponding abstract operator on the numerical component 
of the abstractions as they might have different definition sets. A first naive solu- 
tion would be to extend their respective oe set and to perform the abstract 


operation on the resulting elements: Ni Gu iee N u ,- However consider 


M = ({z = y}(= U*), {x,y}, {x, y}) and N = ({z = —_ z}(= V’), {2,2}, {£, 2}, 
where the underlying domain is the octagon domain where elements are repre- 
sented as a set of linear constraints (e.g. {x = y}). We have Ow a= {x = y} 


and Vic yz} = {2 = z}, hence hee, Jal Liana Vite, y.z} = T- Consider now the 
abstract element in MË: R = {x = y,x = z}(= W*), {x}, {a,y,z}). The con- 
cretization of R over-approximates the union of the concretization of M and N, 
and its numerical component is more precise than T. We note that the numerical 
constraints appearing in W# could be found in UË or V#, therefore in order to 
remove the aforementioned imprecision we define a refined abstract union opera- 
tor, denoted as I, that uses constraints found in the inputs in order to refine its 
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Algorithm 1. strengthening operator 


Input : X*, C: a set of constraints, U# € Ny: a soundness threshold on 
environment u, VË € Ny: a soundness threshold on environment v 
Output: Z’ an abstract element over-approximating U* on u and Vt onv 


1 Z! X#. 

2 foreach c € C do 

3 T? — [Assume(c)]#""" (Z*); 

a | if U? Cë TË, AV? CS TË, then 
5 | Zł Tř; 

6 end 

7 return Z’; 


result. This is done using the strenghtening operator of Algorithm 1 which adds 
constraints from C that do not make the projection of X* to u (resp. v) lower 
than the threshold U? (resp. V#). We assume that, given an abstract element 
U*, we can extract a finite set of constraints satisfied by U*, those are denoted 
constraints(U*) (the more constraints can be extracted, the more precise the 
result will be). For example if the numerical domain is the interval domain, con- 
straints have the form +x > a. If the numerical domain is the octagon domain 
the constraints operator yields all the linear relations among variables that 
define the octagon. 


Definition 11 ( operator). Let Ut € N,, V? € N, be two numerical envi- 
ronments, let XË € Nyuy, let C be a sequence of numerical constraints over uUv, 
let c = u Nv we define: 
UaV? = let X? = (UP ug Viau in 
let C = constraints(U*) U constraints(V*) in 


strengthening(X*, C, U*, V*) 


Remark 5. — The precision of # depends upon the order of iteration over con- 
straints c € C in Algorithm 1. Our implementation currently iterates in the 
order in which constraints are returned from the abstract domains. More 
clever heuristics will be considered in future work. 

— U'wV* starts by performing the join over the domain c, the result is 
then strengthened. Other strenghtening(X#,U* € N,,V* € N,) opera- 
tor could be defined, however in order to ensure soundness of ll, it must 
satisfy the following constraints: U? Cë strenghtening(X?,U',V*) and 

V? Cg strenghtening(X*, U?,V*). 


Example 9. Let us now consider the example introduced thereinbefore U* WV# = 
{£ = y,y = z} € Niz yz}; Indeed using the notations of Definition 11: Z# = 
X! = T € Nay 23, C = {x = y,y = z}, moreover [Assume(x = y) buoy) = 
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{2 = y HÊ TH), UEC {a= y} = Ti, yy and VEG) T = Thep, There- 
fore constraint x = y is added to Z*. At the next loop iteration: [Assume(x = 
2I Ue = y} = {2 = ys = ETH, ECE fe = y} = TH, and 


vi cine} {r=z}= Th. 2} Therefore constraint x = z is added to Zh. 


Proposition 6 (Soundness of w). let U? € N, and V? € N,, then yë (U#) C 
(W (U* wIV#)) |, and W (V*) E (6 (U* HIV?) 1. 


Definition 12 (Union abstract operators). We define the following abstr- 
act set operator: (NË, l, u) U (NË,V, w) = (NË WN”, IOV, uU u’). This operator 
soundly abstracts the union. Moreover in order to ensure the stabilization of 
infinitely increasing chains in WË we define the following widening operator: 


(NEVENE, 1, u) whenlLCU Aw Cu 
(NF l u)v (N”, l, u’) = (Nt wWN*Y Vu) when Clau Cu 
T otherwise 


Remark 6. This widening operator over-approximates to T whenever the upper- 
bound on the definition set is growing. This yields a huge loss of information 
however this numerical domain is designed as a tool domain used by a higher 
level abstraction in charge of stabilizing the environment before applying the 
widening, so that this case will not be used in practice. 


Subsequent tree abstractions require the definition of the following operators: 


A 


— (NË, 1, u)j-x = (Nial \ {z}, u \ {a}) and (NË, l, u) 4a 2 Aone U 
{x}, u U {x}) which respectively removes (adds) a variable to the numerical 
environment. 

— (N41, u)|s is computed by adding variables in S and not in u and removing 
variables in u that are not in S. 


4.2 Representation of Maps over Potentially Unbounded Sets 


In this subsection we focus on the problem of defining abstract numerical envi- 
ronments on potentially infinite environments. A classical method we use here is 
variable summarization (see [13]). This is based on the folding of several concrete 
objects (a potentially infinite number) to an abstract element which summarizes 
all concrete objects. The folding is encoded in a function f mapping summa- 
rized variables to the set of concrete variables they abstract. Given an abstract 
numerical environment NË and a mapping from summary variables: V’ to sets of 
concrete variables f € V’ — p(V) where f(v1) A f(v2) Æ 0 => vı = ve, we define 
the collapsing of a partial map p € V ~ Z under a summarizing function f: 


ls (P) = {Ø E V! Zvu EV, (f(0') Ndef(p) = 0A p'(v’) = undefined) 
V (av EV, ve f(v’) Ndef(p) A p'u") = pl(v))} 
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Example 10. Consider V’ = {z, y, z,t} and V = {a,b,c,d,g, h}, the environment 
p = (a= 0,b => 1,c |> 2,d + 3) and finally the summarizing function f = (x > 
{a}, y = {b,c}, z => {d}, t + {g}). Collapsing environment p under f yields the 
set of environments: (x > 0, y > 1, z }> 3) and (xz => 0, y > 2,z 38). 


Given a summarizing function f we can now define an extension of the con- 
cretization function y of the previous subsection in the following manner: 


VFN") = {pE V =» Z lz (o) E N’ 


Example 11. Going back to Example 10 and considering the numerical abstract 
element: NË = Qx < y},{a},{x,y}), we have: y(N*) = {(x =œ a) | ae 
Z} U {(£x > ay = B) | a < B}. We have: m € [f](N*) Sly (m) € 
y(N*) = {x} C def (|p (m)) C {x,y}. Therefore if we assume m defined on d 
then f(z) M def(m) 4 Ø hence there would be an element in | (m) defined 
on z. Hence m is not defined on d, similarly for g. Moreover {x} C def (|s (m)) 
implies that m is defined on a. Finally: defining S = {(a > a) | a € Z} U {(a => 
a,b > b) |a < P}U {la a,c p) |a < pP}U{la m a,b Bem g) | a< 
Brna < y}. We have: y[f](N*) = S U (Usestf Y (h = ô) | 6 € Z}). 


The abstract domains we will define in the following sections will employ this 
summarization framework. The manipulation of summarized variables requires 
the definition of a fold(E,x,S) (resp. expand(E,x,S)) operator yielding a 
new environment where x is used as a summary variable for S (resp. where 
a summary variable x is desummarized into a set of variables S). Let S 
and S’ be two finite sets of elements such that S'N S C {x}, we define: 
expando(N*,£,S”) = [eg N'e > vlis\tepuse and foldo(N#,x,S”) = 
Lhes» N*[v = zlis\srjuta} (which generalize the one introduced in [13]). These 
operations are lifted as operators on elements of MË: 


expand((N*,1,u),2,S) = (expand,(N*,x,S),1\ {x}, (u\ {£} US) 
(l\ S)Uf{a}if S Cl 


fold((N*, l u), £, S) = (foldo(N*, r,S), { (1 \ S) otherwise (u \ S) U {x}) 


5 Natural Term Abstraction by Numerical Constraints 


We are now able to represent sets of maps with heterogeneous supports and to 
lift their concretization (modulo a summarization function) to sets of maps with 
infinite and heterogeneous supports. Given a tree shape (in the sense of Sect. 3), 
we can associate a numeric variable to each numeric leaf, and use a numeric 
abstract element to represent the possible values of these leaves. We will name 
the variable of each leaf as the path from the root to the leaf, i.e., V is a set of 
words in {0,...,2 — 1} where n is the maximum arity of the considered ranked 
alphabet. In order to avoid confusion such paths will be denoted (0, 1,15 for the 
word (0,1,1). A summarized variable then represents a set of such paths. We 
will abstract such sets as regular expressions. Using the summarization extended 
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to heterogeneous supports presented in the previous section, it will be possible 
to represent, using a single numeric abstract element, a set of contraints over 
the numeric leaves of an infinite set of unbounded trees of arbitrary shape. 


5.1 Hole Positions and Numerical Constraints 


The presentation of our computable abstraction able to represent numerical val- 
ues in trees is broken down (for presentation purposes) into two consecutive 
abstractions. The first one is not computable, as natural terms are abstracted as 
partial environments over tree paths to numerical values. This abstraction looses 
most of the tree shapes but focuses on their numerical environment. A second 
abstraction will show how partial environments over paths are abstracted into 
numerical abstract elements defined over a regular expression environment. 

In the following, when œR is a ranked alphabet of maximum arity n, we call 
words sequences of integers, w = (wo,..-,Wp—1) € {0,...,(m—1)}” will be called 
a word of length p (denoted |w|), w; denotes the i-th integer of the sequence, 
W = (wi,...,Wp-1) is the tail of word w, W(R) = {0,..., (n — 1)}* is the set of 
all words over {0,...,n — 1} of arbitrary size. 


Definition 13 (Position in a term). Given a natural term t and a word w 
we inductively define the subterm of t at position w (denoted tjw) to be: 


(two) [a when |w| > OAt= f(to,.-.,tp-1) with wo < p 
tin = t when |w| = 0 
undefined otherwise 


Moreover we denote by numeric(t) = {w E€ N* | tiw € Z}. 


Definition 14 (Positioning lattice with exact numerical constraints). 
We define C(R) = e(W(R) =» Z), an element of C(R) is therefore a set of 
partial maps that are acceptable bindings of positions to integers. 


Proposition 7 (Galois connection with natural terms). When t is a 


natural term, tz is the partial map: A\numeric(t)W-tw- We have the following 
y 
Galois connection: (p(Tz(R)), ©) Lii (C(R), C), with: 


AC(R) 


yer) (T) = {t € Tz(R)| tzer} acrry(T) = {tz |t eT} 


Example 12. Consider our running example (introduced in Example2), V = 
{+(2,+(z,y)) | £ < y Az < y}, we have acr) (V) = {05 => a,(1,0§ > 
y, l1, 15 Bla<BAy< p}. The concretization of which is exactly V. 


Example 13. Consider however the ranked alphabet {f(2), g(2),a(0)}, and the 
tree a. Its abstraction contains only the empty map, the concretization of which 
is the set of all terms that do not contain any numerical value. For example: 
f(g(a, a), a), g(a,a),.... This emphasizes that we loose information on: 
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— the labels in the natural terms: we only have the path from the root of the 
term to leaves with numerical labels, not the actual symbols along the path. 

— the shape of the natural terms: we do not keep any information on subterms 
that do not contain numerical values. 


Now that we have abstracted away the shape of the terms, we are left with 
numerical environments with potentially infinite dimensions (that are words over 
the alphabet {0,...,n—1}) and different definition sets. Therefore following the 
idea of Sect.4 we want to define a summarization for sets of words over the 
alphabet {0,...,2— 1}. A summarization of such a language can be expressed 
as a partition into sub-languages. The set of regular languages over the alpha- 
bet {0,...,n — 1} is a subset of the set of languages over this alphabet, that is 
closed under common set operations. Hence given a set {r1,...,7%m} of regular 
expressions (with respective recognized language {L1,..., Lm}), we summarize 
all words in L; inside a common variable r; and therefore f {71,..., 1m} denotes 
the summarization function: Ar;.L;. In the following, Reg,, denotes the set of 
regular expressions over the alphabet A, = {0,...,n — 1}. As for tree regular 
expressions, (Reg,,,C,M,U,.°,@, Až) is a (non complete) complemented lattice 
of infinite height, upon which we can define a widening operator V (see [10]) in 
a similar manner as for tree regular expressions (this widening is also parame- 
terized by an integer constant). We recall moreover that operators C,M,U and 
complementation (.°) are computable, and that every finite set of words is regu- 


e Id = 
lar. Moreover we have the following representation: (Až, C) = (Reg,,, E). 


Finally in order to disambiguate regular expressions from integers we will typeset 
them within |.| in a bold font as in: |0 + 0.1*|. 


Example 14. Using notations from Sect.4.2, V’ = Reg, and V = W(R). 
Consider our running example (introduced in Example 2), natural terms from 
V = {+(2,+(z,y)) |£ < yAz < y} contain three paths to numerical values: (05, 
(1,0§ and (1,15. Numerical constraints on (05 and (1,05 are similar, therefore 
the two paths are summarized into one regular expression: |0 + 1.0], (1,1) is 
left alone in its regular expression: |1.1|. The two constraints x < y ^z < y can 
now be expressed as one: |0 + 1.0] < [1.1]. 


In Example 14, we saw that tree paths with similar numerical constraints can 
be summarized in one regular expression. However, for precision purposes, we 
do not want to summarize all tree paths into one regular expression. Hence, we 
will keep several disjoint regular expressions, which we call a subpartitioning. 


Definition 15 (Subpartitioning). Given a regular expression s, a subparti- 
tioning of s is a set {s1,...,5n} of regular expressions such that Vi 4 j, siN sj = 
Ó and Ui, si C s. We note P(s) the set of all subpartitioning of s. Moreover if 
S = {s1,...,5n} is a set of regular expressions, [S]g = S \ {0}. 


Remark 7. Contrary to a partitioning of s, we do not require that the set of 
partitions covers s. Indeed when a set of tree paths is unconstrained we can 
just remove it from the partitioning, therefore no dimension in the numerical 
abstract environment will be allocated for this path. 
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shared partitions 
SË partitions 
si support 


S# partitions 


Sž support 
Sè sè 


Fig. 6. Unification operator 


Definition 16 (Positioning lattice with numerical abstraction). Given 
a ranked alphabet R, where the maximum arity of symbols is n, we define 
CHR) = {(s,p, RË) | s € Reg, p € P(s), RË € ote}. Therefore C*(R) are triples 


containing: 


- s: (called support) a regular expression coding for positions at which numerical 
values can be located. 

— p: a subpartitioning of s. Elements of the same partition are subject to the 
same numerical constraints. Note that these partitions are regular. 

- RË: an abstract numeric element where a dimension is associated to each 
partition, this dimension plays the role of a summary dimension. 


Remark 8. In the following, numerical abstract elements described in the form 
{c}, where c is a set of constraints, refer to (c, vars(c), vars(c)) € WË. 


Algorithm 2. unify_join operator 


Input : (s,{pi,...,pn}, R’), (s', {p),.... pn}, R”) two abstract elements 
Output: two unified abstract elements 


1 (Cij)i<nj<m = Pi N ph; 

2 (Pi)i<n — pins’; 

3 (Di )i<m = DANS 

4 (di)isn — pis’ M1 (Uj<meig)*s 

5 (dj )i<m — P} NSN (Visncig)®; 

6 Rig Ri; 

7 RY — RP, 

8 for i = 1 to n do 

9 | RË — expand(R', pi, [{ci,;};<m U {pi} U {a }]o); 
10 for j = 1 to m do 
11 | RY — expand(R", p}, Hcijticn U {p3} U {qj }o)s 
12 return (s, Uicn j<ml{li Pi, ci tlo, RË), (5, Ucn jcm {t Pi cis Yo, RY); 
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Unification. The previous definition shows that two elements UË = (s, p, R*) 
and V# = (s',p', R”) can have different subpartitionings (p and p’). However the 
partitions in p and in p’ might overlap, thus giving constraints to similar tree 
paths. Therefore in order to define the classical operators: C, U and V, we need 
to unify the two abstract elements (U# and V*) so that given a tree path and the 
partition in which it is contained in U®, it is contained in the same partition in 
VË. This will enable us to rely on abstract operators on the numerical domain. 
In order to perform unification, we rely on the expand and fold operators. 
Indeed consider our running example, U* = (|0 +1], {10],|1]},{10] < |1]}) 
and V? = (|0+1.(0+1)|,{|0+1.0], |1.1]},{|0+1.0] < |1.1]}). We see that 
constraints on tree path (05 is given: in U? by partition |0] and in V# by partition 
[0 + 1.0]. However we can split the partition |0 + 1.0] into two partitions: |0] 
and [1.0], and expand variable |0 +1.0] into the two variables |0| and [1.0] in 
the numeric component: expand({|0+1.0| < |1.1]}, |0+1.0], {|0], |1.0|}) = 
{|O| < [1.1], [1.0] < |1.1]}. Once U? and V# are unified we can rely on the 
numerical join to soundly abstract the union. Note that splitting partitions is 
more precise than merging them. Indeed, consider the example where: in U* we 
have |0] > 0 and |1] < 0 and in VË we have |0 +1] = 0. Splitting partition 
in V? yields: |0] = 0, |1] = 0, after joining we get [0] > 0,|1| < 0. Whereas 
merging partitions in U* yields |0 + 1| unconstrained, after joining we also get 
that |0+1] is unconstrained. However unifying by splitting or merging partitions 
in both abstract elements might result in an over-approximation of the initial 
elements. This does not pose a threat to the soundness of the join operator, but 
it does for the inclusion test. Unifying by splitting partitions induces an increase 
in the number of partitions which we want to avoid when trying to stabilize 
abstract elements in the widening. Hence, we define three unification operators: 


— An operator unify_join that splits partitions from U? and V*, this operator 
might induce an over-approximation for both U? and V# and is used in the 
join operation. This operator is presented in Algorithm 2, and illustrated in 
Fig. 6. 

— An operator unify_subset that does not modify V* (in order to avoid over- 
approximated it), we only split and merge (using the fold operator) partitions 
from U? as, if the over-approximated UË is smaller than V*, then so is the 
original U?. 

— An operator unify_widen that unifies U? and V* by only merging partitions 
so that the number of partitions does not increase. This operator is used in 
the widening definition. 


Operators unify_subset and unify_widen are very similar to unify_join. 


Definition 17 (Comparison Ce: (ry). Using unify_subset we define a rela- 
tion on C#(R): Cercry= {(U4, V¥) | ((s,p, N°), (s’,p’, N*)) = unify_subset (U?, 
VË) > s Cs AYbE p, (bC stva Ep, bN s= a) A N? C N*([d]} where ġ is 
the renaming from p' into p that renames b to a when such an a exists. 
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Example 15. Going back to our running example: Ut = (|0 + 1], {[0J,|1]}, 
{|0| < |1]}(= A*)) and V# = ([(0+1.(0 + 1)|, {10+ 1.0], |1.1]}, {10 + 1.0] < 
[1.1]}). We have s Z s’ hence U? Z V#. However if we now consider W?: 
(L(e+1).(0+1)], {L(e+1).0], [(e+1).1]}, {L(€+1).0] < [(e+1).1]}(= BY). wt 
is already unified with U#, we have s C s’ and @: (|(€+1).0| + 0, |(€+1).1] — 
|1]). Moreover AË E B*[d] = {|0| < |1]}. Hence U? C Wt. 


Proposition 8. We have: (C(R), Eecr)) na (C!(R), Eerr)), where: y1((s,p, 
R*)) = {f | wae ) C YReg, (S) A f E ait p|(R')}. By composition we get: 
(o(TZ(R)), C) @ (CHR), Ecir), with y2 = VCR) ° V1- 


Example 16. Going back to our running example: V? = (|0 + 1.(0 + 
1)|, {10 + 1.0], |1.1]},{[0 + 1.0] < |1.1]}). We have: f? p = ([0 + 1.0] — 
{05,01,05}, [1] => US). Hence, w(V#) = {005 > aS = 8) | a < 


P} U {01,05 = a, S = 8) | a < BYU{(OS = a,ll, 05 = 7,15 = 8) | 
a<BAy< B}. The product with tree automata refines this result so that only 
the last set is left. 


We now define the U operator that relies on the unify_join operator of Algo- 
rithm 2. Once elements are unified we can distinguish three kinds of partitions: 
(1) Partitions found in both abstract elements (e.g. % in Fig. 6). (2) Partitions 
found in only one of the two, which do not overlap over the support of the other 
abstract element (denoted u°), these are outer-partitions. Information on such 
partitions can be soundly kept when joining two abstract elements (e.g. partition 
a in Fig.6). (3) Partitions found in only one of the two, which overlap over the 
support of the other abstract element, these are inner-partitions. Information 
on such partitions can not be soundly kept when joining two abstract elements. 
(e.g. partition b in Fig. 6). Therefore in the following definition of the join oper- 
ator, we compute (once elements are unified) the common partitions and both 
outer-partitions and merge them to form the resulting subpartitioning. 


Definition 18 (Union abstract operator). Given U#, V? € C#(R), if 
((s,p, RË), (s’,p’, R”)) = unify_join(U*,V*), let c be pUp’, let u? (UË outer- 
partition) be {e € p | e C s/*}, let v? (VË outer-partition) be {e € p’ | e C s°}, 
we then define: 

Ut Uesg) VË = (sU s'eUu? UW, Ruo ORY yo) 


[cUu |cUv? 
Proposition 9. We have: y,(U*) U71(V*) C y (U* Uc (rR) V*). 


Example 17. Consider the two following abstract elements (this is the par- 
ticular case of our running example where all numerical values are equal): 

= ([0 + 1.(0 + 1)|(= s), {|0 + 1.0] (= a), |1.1](= b), {a = b}}), and UË = 
(10 + 1|(= s’), {|O|(= c), |1](= d)}, {c = d}). Intuitively U* could encode the 
term (a+ 2) and V# the term (a+(x+2)). The unification of those two elements 
is: VË = (s, {c, b, [1.0|(= e)}, Rt) where R? = Hc = b,e = b}, {b}, {c,b, e}) and 
Uf = UË, moreover the common environment (c in previous definition) is: {c}, 
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|0| =0 [(e+1).0| =0 


Fig. 7. Widening illustration 


VË outer-partitioning is {e, f}, U? outer-partitioning is {d}. Hence: the numer- 
ical component resulting of the join is: ({c = d}, {c,d}, {c,d}) U Hce = b,e = 
b}, {b}, {c, b, e}) which is: ({c = b,e = b,c = d}, 0, {c,d, e, b}). We see here that 
using a naive numerical join operator, we would not have been able to get such 
a precise result (the numerical join would have yielded T). 


unify_widen C#(R) contains infinite increasing chains, therefore, we need to 
provide a widening operator. As for the other operators, widening is computed 
on unified abstract elements. A unify_widen operator is defined: it produces U# 
and V}, over-approximations of its inputs with the same number of partitions. 
Moreover it ensures that each partition of U4 intersects exactly one partition of 
VË. This can be obtained by iterative merging partitions that overlap in both 
arguments until the abstract elements have the exact same partitions. Therefore 
from the result of unify_widen we can extract a list of pairs (a,b) where a is a 
partition from UË, b is a partition from V4 and anb ¥ Ø. This defines a bijection 
from partitions of U? onto partitions of V*. 


compose. In order to ensure stabilization we first need to stabilize the supports 
on which abstract elements are defined. This is easily done using the automaton 
widening (s1Vs2 in Algorithm 3). Figure 7 illustrates the following simple exam- 
ple: UË is an abstract element with support |0 + 1], two partitions u = |0] 
and u’ = |1|, and numerical constraints u’ = 1 and u = 0. VË is an abstract 
element with support |(e + 1).(0 + 1)]|, two partitions v = |(e + 1).0| and 
vo’ = |(e + 1).1| with the numerical constraints that v = 0 and v’ = 1. Sup- 
ports are unstable, therefore we start by widening them, which yields a new 
support: |1*.(0 + 1)|. The unification of U? and V? leaves subpartitionings 
unchanged and yields the bijection (u +> v,u’ + v’). Given this information 
we now need to provide a new subpartitioning for the result of the widening. 
We see in this example that we could soundly use the subpartitioning from V®, 
this would produce the abstract element Zi depicted in Fig. 7. However due to 
the widening of the support, paths of the form (1,1,1,05 are in the support of 
the result but are left unconstrained as they are not in any of the partitions. 
Therefore we need to use the opportunity of the extension of the support to 
place constraints on the newly added paths. In order to do so we would like to 
force the extension of the existing partitions from U* and V# into the new sup- 
port. Therefore we need to define a compose operator that produces a sound 
new partition, given: (1) a pair a,b of partitions (such as the one produced by 
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Algorithm 3. widening operator 


Input : U}, V” two abstract elements 

((s1, Pi, Ri), (82, p2, R3)) = unify_widen(U*, Vv") ; 
S — 81V823 

r—s\(s1U s2); 

foreach a € pı do 

b — the unique element from p2 such that bN a Æ 9; 
p — compose(a, b, $1, 82,17); 

p — {p} Up; 

RU — RÌ [a > p]; 

RY — Rb > p]; 

rer \p; 

if p = pı then 

return (s, p, Ri*V RË*); 

else 

return (s, p, R?* U R$“); 


omMmnNaornh wn 


BPR eR eB 
v Ne O 


= 
A 


unify_ widen), (2) the support sı (resp sz) in which a (resp. b) lives and (3) 
a space to occupy r. The following criteria must be verified by the resulting 
partition p in order to be sound and to terminate: p N sı = a, p N s2 = b and 
p \ (s1 U s2) C r. A variety of compose operators could be defined, we chose: 
compose(a, b, $1, 82, r) = aU(bN(s2\s1))U((aV (aUb))Nr). The idea is the follow- 
ing: we keep a (as it is always sound thanks to the definition of the unify widen 
operator), we keep the part from b that satisfies the soundness condition, and we 
extend into the space left to occupy according to the automata widening of a and 
aUb. In our example, considering the pair (u, v), this would translate as: a = 0, 
bN(s2\s1) = [1.0] and (aV(aUb))Nr = [0| V |(e+1).0JA|127?(0+1)] = | 127.0}. 
We get the new partition: |1*.0|. Doing the same with the pair (v,v’) yields 
|1*.1]. Finally we get the abstract element Zh from Fig. 7, which is more precise 
than z. 


Definition 19 (Widening). Algorithm 3 provides the definition of a widen- 
ing operator using the unify_widen operator and parameterized by a compose 
function. 


Widening Stabilization. Our abstraction contains three components: (1) a sup- 
port that describes the set of paths (2) a subpartitioning of this support and (3) 
a numerical component giving constraints on partitions in the subpartitioning. 
We show how the widening operator stabilizes all three components. 


— Regular expression widening is used on supports when widening is called. 
Therefore ensuring support stabilization. 

— Once supports are stable (this means s2 C s1), we have p = a for every pair 
(a, b) of partitions. Meaning that once shapes stabilize, the only modifications 
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allowed on the subpartitionings are those made by the unify_widen operator. 
Each partition resulting from the operator is the union of input partitions, 
hence the subpartitioning will stabilize. 

— Once subpartitionings are stable (pı = p in Algorithm 3) numerical widening 
is applied on the numerical component in order to ensure stabilization. 


Example 18 (Numerical example). Consider the simple example where: R = 
{f(2)}, Ut = (L0 +1), {[0], [1J}, {L1] = LOJ}) and V# = (L0 + 1], {[0], [1]}, 
{|1] > [0], |1] < [0] +1}). UË and V# have the same shape, therefore widening 
will be performed on the numerical component of the abstraction, therefore: 
U'vV# = ([0 + 1J, {[0], [1]}, {L1] = L0]}). 


Reducing Dimensionality and Improving Precision. As emphasized by the pre- 
vious examples, definitions and illustrations, the numerical component of an 
abstract state is used as a container for constraints on regular expressions, every 
node in a regular expression must then satisfy all numerical constraints on the 
underlying regular expression. Therefore when two nodes of a tree satisfy the 
same constraints, they should be stored in the same partition so as to reduce the 
dimension of the numerical domain (thus improving efficiency). Moreover the 
widening operator provided in Algorithm 3 relies (for precision) on the fact that 
partitions are built by similarity of constraints, therefore partition merging, when 
it does not result in an over-approximation, also leads to a precision gain. The 
unification operator defined in Algorithm 2 tends to split partitions whereas the 
widening operator defined in Algorithm 3 tends to merge them. In order to reduce 
dimensionality, we would like to define a reduce : C#(R) — C*(R) operator, that 
folds variables with similar constraints into one. Please note that VSN S’ C {a}, 
x € S and RË € Ng, we have that RË Cy, expand(fold(R', x, S$’), x, S’). 
This means that when variables are folded into one, expanding them after- 
wards would yield a bigger abstract element. For example, consider the octagon 
Rt = {x > 2,y > 2,2 = y} then fold(R!,z,{a,y}) = {z > 2}(= R”) 
and expand(R", z,{x,y}) = {x > 2,y > 2}. However if we consider RË = 
{x > 2,y > 2} then fold(expand(R’, z, {2, y}),z,{x,y}) = R*. Therefore if 
we assume given a score function score(R*,x,S’) ranging in [0,1] such that 
score(R?, x, S) = 1 = R? = expand(fold(R', x, 9’), £, S'), we are able to 
define a generic reduce operator parameterized by a value a. This reduce 
operator merges partitions until no more set of partitions has a high enough 
score according to the score function. Finding a good score function is a 
work in progress. As a first approximation we used the following trivial one: 
scoreo( Rt, S) = 1 when expand (fold(R?,2,$),x2,S) = R? and 0 otherwise. 
This scoreg guarantees there is no loss of precision, but can miss opportuni- 
ties for simplification. 


Example 19. Consider the following example: U* = (|0+ 1], {[0},|1]},{|0| = 
0,|1] = 0}). Relations on |0] and |1] can be expressed in one relation using 
the summarizing variable [0 + 1|. This yields: reduce(U*) = ([0 + 1|,{|0 + 
1]},{[O + 1] = 0}). Note that expand({|O +1] = 0},[0+ 1], {|1], |O|}) = 
{|0| = 0, |1] = 0}. Therefore no information is lost. 


746 M. Journault et al. 


Abstract Semantic of Operators. As for tree automata, abstract semantic of 
operators defined in Sect.2 can be defined as simple transformations on regular 
automata. Indeed the make_symbolic(s € R) (resp. get_son) operator, amounts 
to adding (resp. removing) an integer letter to: (1) the partitions in the subpar- 
titioning and (2) the support. make_integer(e € expr) amounts to building an 
abstract element with support |¢| and a subpartitioning containing only {|e]}, 
on which we put the constraint that it is equal to e. is_symbol needs only split 
the support and each partition, in the two language L = {e} and Až \ L. Indeed 
in order to restrict to terms having only an integer as root, the support must 
be reduced to e. The get_sym_head operator always yields the whole ranked 
alphabet (as this was abstracted away and will be refined by the automaton 
abstraction). Finally for get_num_head: (1) if the empty path (J is in the sup- 
port we produce the set of integers satisfying the numerical constraints on the 
partition containing €, and T in case no such partition could be found, and (2) 
otherwise we know that no numerical value is produced. 


5.2 Product of Tree Automata and Numerical Constraints 


The abstraction by tree automata defined in Sect.3 and the abstraction by 
numerical constraints on tree paths defined in Sect.5.1 provide non compara- 
ble information on the set of terms they abstract. Indeed the former describes 
precisely the shape of the term but can not express numerical constraints whereas 
the latter abstracts away most of the shape and focuses on numerical constraints. 
To benefit from both kinds of information, we use a reduced product between the 
two domains. Both abstractions in the product contain information on potential 
integer positions. The position of the [O symbol in the tree automaton abstrac- 
tion and the support in the numerical constraints abstractions both yield this 
information. We remove the support component from the product as the infor- 
mation can be retrieved from the tree abstraction. The definitions of the abstract 
operators in Sect.5.1 require the support to be a regular language. We show in 
this subsection how to retrieve the support of a tree automaton with holes and 
that it is regular. 

Given a FTA(Q,R, Q;,5) over a ranked alphabet R with maximum arity 
n. We assume that every node in Q is reachable. Consider the following system 
over variables vp for p E€ Q with values in the set of languages over the alphabet 
An (. designates the classical concatenation operator lifted to languages): 


(m= U. amu PP ESR! [ve a} 


otherwise 
(8,(q15-+-54m),9)€6|qi=p 


Every language {i} for i € N is regular and does not contain €, moreover 
Ø and {e} are regular languages. By application of Arden’s rule (see [18]) and 
Gauss elimination we can compute the unique solution of this system, moreover 
every Up is regular. Variable vp is defined so that: w € vp if and only if there 
exists a tree t recognized by the automaton such that p E€ REACH(t),,). FOER 
we have that the regular language: U(G,(),p)esUp represents exactly the potential 
positions of integers in trees accepted by the tree automaton. 
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Height and Size. The product is enriched with a simple height and size abstrac- 
tion: numerical variables (encoding heights and sizes) are added to the numerical 
component of the abstraction. 


5.3 Environment Abstraction 


In the previous section, we designed abstractions for sets of trees. However in 
order to be able to tackle the examples from the introductory section (Sect. 1) we 
need to design an abstraction able to represent maps from a set of variables to 
natural terms. In Sect. 3 we have shown how to lift abstractions on natural terms 
to abstractions of environments over a given finite set of finite term variables 7. 
We apply the same mechanism here to lift the product presented in Sect. 5.2. 
However lifting the product would result in abstract environments being maps 
from natural term variables to abstractions containing a numerical environment. 
In order to be able to express numerical relations between two sets of natural 
terms or even between numerical program variables and numerical values of 
natural terms we factor away the numerical environment so that it is shared 
by all natural term abstractions in the term environment and by the program 
variables in the numerical environment. Therefore the final abstraction is a pair 
(m, RË) where: (1) m is a map from T to an abstract element that is a product 
of the automaton abstraction and the hole positioning abstraction. Moreover 
as all the numerical constraints are stored in a common numerical environment 
the product abstraction amounts to a pair (A,p) where A is an element of the 
automaton abstraction and p is a partitioning of its support. (2) RË is an element 
of MË binding in the same numerical element: numerical program variables and 
all partitions found in the mapping m. 


6 Implementation and Example 


6.1 Implementation 


The analyzer was implemented in OCaml (~5000 loc) in the novel and still 
in development Mopsa framework (see [21]). MOpPsA enables a modular devel- 
opment of static analyzers defined by abstract interpretation. An analyzer is 
built by choosing abstract domains, and combining them according to the user 
specification. MOPSA comes with pre-existing iterators and domains (e.g. inter- 
procedural analysis, loop iterators, numerical domains, ...), and new ones can 
be added (e.g. tree abstract domain). A key feature of Mopsa is the ability 
of an abstract domain to use the abstract knowledge it maintains to trans- 
form dynamically expressions into other expressions that can be manipulated 
more easily by further domains, providing a flexible way to combine relational 
domains. For instance, assume that a domain abstracts arrays by associat- 
ing a scalar variable ap, ai, ..., to each element a[0], a[l], ..., of an array a, 
and delegating the abstraction of the array contents to a numeric domain for 
scalars. It can then evaluate E*[2 x afi] + i](¢ + [0,1]) into the disjunction 
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(2*a9 +i, i = [0,0]) V (2 * ai +i, i| [1, 1]), indicating that 2 x ali] + i is equiv- 
alent to 2 * ag +7 in the sub-environment where i = 0 and to 2 x a; +7 in the 
sub-environment where i = 1. Each term of the disjunction contains an array-free 
expression that can be handled by the scalar domain in the corresponding sub- 
environment. In the abstract, expressions can be evaluated by induction on the 
syntax into symbolic expressions to retain the full power of relational domains 
and disjunctive reasoning (see [21] for more details). We exploit this feature in 
our implementation to combine our tree abstractions. We implemented (in the 
Mopsa framework) libraries for regular and tree regular languages that offer the 
usual lattice interface enriched with a widening operator. These libraries can be 
reused for the definition of other abstract domains. The overall complexity of 
the analysis is driven by the complexity of the lattice operations in the regular 
and tree regular libraries. These are exponential in the number of states of the 
considered automata, which is bounded by the widening parameter. 


6.2 Examples of Analysis 


Numerical variables of the form t.x, where t is a natural term variable, represent 
a variable allocated for tree t. For example: t.r where r is a regular expression 
is the variable allocated for partition r in tree t. 


C Introductory Example. Let us consider the introductory example Pro- 
gram4. The loop invariant inferred with our analysis is the following 
abstract element: UË = (y + (A,{|0.(0.0)*.1|](= r)}), RË), with A = 
({a, b, c, d}, {*(1), +(2), (0), (p, 0)}, {c}, {#(@) > c, +(c,a) > d,O() > a,p > 
c}), and RË satisfies the constraints: {i > 0,i <n,y.r = 4}. This describes pre- 
cisely the set of terms of the form: p, *(p+4), «(*(p+4)+4), .... As mentioned in 
Sect. 6.1 evaluations of tree expressions yield pairs containing an expression and 
an abstract environment. Tree expressions are pairs (A,p), partitions in p are 
bound by the adjoined environment. Let us now present the result of the evalua- 
tion of the make_integer (4) expression in the abstract environment U*. Here we 
get the expression (A’, {|¢|}) (where A’ recognizes only O) in the environment: 
(y + (A, {r}), R?) where R?” = R? U {|e| = 4}. This emphasizes how the envi- 
ronment is used to give constraints on the adjoined expression. This transports 
numerical relations from the leafs of the expression up to the assigned variable t. 


OCaml Introductory Example. Let us now consider the introductory exam- 
ple Program5. The inferred loop invariant is the following (r = |(1.1)*.0| 
and r° = |(1.1)*.1.0]): (t + (A,{r,r'}), R?) and RË satisfies the con- 
straints: {tr’ = x—-—1tr = tr’ +2,i > 0i < n} and A = 
({a, b,c, d}, {Cons(2), Ni1(0),(0)}, {a}, {Cons(c, a) — d, Cons(c, d) > a,Nil —> 
a, O — c}). Please note that at the end of the while loops the two numerical 
environments that need to be joined are not defined over the same set of vari- 
ables (in the environments that have not gone through the loop, variables t.r’ 
and t.r are not present). However thanks to the # operator, we do not have to 
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loose the numerical relations between these variables and x. Hence we are able 
to prove that the assertion holds. 

The analyzer was able to successfully analyze and infer the expected invari- 
ants for both examples. 


7 Related Works 


Previous works on sets of trees abstractions [20] were able to recognize larger 
classes of tree languages than tree automata. However we focused here on the 
abstraction of trees labeled with numerical values, therefore the work closest to 
ours would be [12]. Indeed it defines tree automata where leaves can be elements 
of a lattice (for example an interval). They are therefore able to represent sets 
of natural terms, but can not express numerical relations between the leaves of 
trees. Moreover they rely on a partitioning of the leaf lattice for tree automata 
operations. In [1] (and [2]) tree automata and regular automata are used for 
the model checking of programs manipulating C pointers and structures. Other 
uses have been made of tree automata in verification: shape analysis of C pro- 
grams as in [15], computation of an over-approximation of terms computable by 
attackers of cryptographic protocols as in [24]. Widening regular languages by 
the computation of an equivalence relation of bounded index is also done in [9] 
and in [11]. As mentioned, variable summarization is often used to represent 
unbounded memory locations as in [17] or [14]. Moreover numerical abstract 
domains able to handle optional variables have been defined such as [19]. Finally 
termination analyses have been proposed for the analysis of programs manipu- 
lating tree structures (AVL, red-black trees) see [16]. 


8 Conclusion 


In this article we presented a relational abstract environment for sets of trees over 
a finite algebra, with numerically labeled leaves. We emphasized the potential 
applications of being able to describe such trees: description of reachable memory 
zones, tracking symbolic equalities between program variables, description of tree 
like structures. In order to improve the precision of the analysis while not blowing 
up its cost we defined a novel abstraction for sets of maps with heterogeneous 
supports. This numeric abstraction is able to represent optional dimensions in 
numerical domains without losing relations with optional variables. All domains 
presented in the article were implemented as a library in the MOPSA framework. 
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Abstract. We revisit the static dependency pair method for proving 
termination of higher-order term rewriting and extend it in a number 
of ways: (1) We introduce a new rewrite formalism designed for general 
applicability in termination proving of higher-order rewriting, Algebraic 
Functional Systems with Meta-variables. (2) We provide a syntactically 
checkable soundness criterion to make the method applicable to a large 
class of rewrite systems. (3) We propose a modular dependency pair 
framework for this higher-order setting. (4) We introduce a fine-grained 
notion of formative and computable chains to render the framework more 
powerful. (5) We formulate several existing and new termination proving 
techniques in the form of processors within our framework. 

The framework has been implemented in the (fully automatic) higher- 
order termination tool WANDA. 


1 Introduction 


Term rewriting [3,48] is an important area of logic, with applications in many dif- 
ferent areas of computer science [4, 11, 18,23, 25,36,41]. Higher-order term rewrit- 
ing — which extends the traditional first-order term rewriting with higher-order 
types and binders as in the A-calculus — offers a formal foundation of functional 
programming and a tool for equational reasoning in higher-order logic. A key 
question in the analysis of both first- and higher-order term rewriting is termi- 
nation; both for its own sake, and as part of confluence and equivalence analysis. 

In first-order term rewriting, a hugely effective method for proving termina- 
tion (both manually and automatically) is the dependency pair (DP) approach 
[2]. This approach has been extended to the DP framework [20,22], a highly 
modular methodology which new techniques for proving termination and non- 
termination can easily be plugged into in the form of processors. 

In higher-order rewriting, two DP approaches with distinct costs and ben- 
efits are used: dynamic [31,45] and static [6,32-34,44,46] DPs. Dynamic DPs 
are more broadly applicable, yet static DPs often enable more powerful analy- 
sis techniques. Still, neither approach has the modularity and extendability of 
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the DP framework, nor can they be used to prove non-termination. Also, these 
approaches consider different styles of higher-order rewriting, which means that 
for all results certain language features are not available. 

In this paper, we address these issues for the static DP approach by extend- 
ing it to a full higher-order dependency pair framework for both termination and 
non-termination analysis. For broad applicability, we introduce a new rewriting 
formalism, AF'SMs, to capture several flavours of higher-order rewriting, includ- 
ing AFSs [26] (used in the annual Termination Competition [50]) and pattern 
HRSs [37,39] (used in the annual Confluence Competition [10]). To show the 
versatility and power of this methodology, we define various processors in the 
framework — both adaptations of existing processors from the literature and 
entirely new ones. 


Detailed Contributions. We reformulate the results of [6,32,34, 44,46] into a DP 
framework for AFSMs. In doing so, we instantiate the applicability restriction of 
[32] by a very liberal syntactic condition, and add two new flags to track proper- 
ties of DP problems: one completely new, one from an earlier work by the authors 
for the first-order DP framework [16]. We give eight processors for reasoning in 
our framework: four translations of techniques from static DP approaches, three 
techniques from first-order or dynamic DPs, and one completely new. 

This is a foundational paper, focused on defining a general theoretical frame- 
work for higher-order termination analysis using dependency pairs rather than 
questions of implementation. We have, however, implemented most of these 
results in the fully automatic termination analysis tool WANDA [28]. 


Related Work. There is a vast body of work in the first-order setting regarding 
the DP approach [2] and framework [20, 22,24]. We have drawn from the ideas 
in these works for the core structure of the higher-order framework, but have 
added some new features of our own and adapted results to the higher-order 
setting. 

There is no true higher-order DP framework yet: both static and dynamic 
approaches actually lie halfway between the original “DP approach” of first- 
order rewriting and a full DP framework as in [20,22]. Most of these works 
[30-32,34,46] prove “non-loopingness” or “chain-freeness” of a set P of DPs 
through a number of theorems. Yet, there is no concept of DP problems, and the 
set R of rules cannot be altered. They also fix assumptions on dependency chains 
— such as minimality [34] or being “tagged” [31] — which frustrate extendability 
and are more naturally dealt with in a DP framework using flags. 

The static DP approach for higher-order term rewriting is discussed in, e.g., 
[34,44,46]. The approach is limited to plain function passing (PFP) systems. The 
definition of PFP has been made more liberal in later papers, but always con- 
cerns the position of higher-order variables in the left-hand sides of rules. These 
works include non-pattern HRSs [34,46], which we do not consider, but do not 
employ formative rules or meta-variable conditions, or consider non-termination, 
which we do. Importantly, they do not consider strictly positive inductive types, 
which could be used to significantly broaden the PFP restriction. Such types 
are considered in an early paper which defines a variation of static higher-order 
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dependency pairs [6] based on a computability closure [7,8]. However, this work 
carries different restrictions (e.g., DPs must be type-preserving and not introduce 
fresh variables) and considers only one analysis technique (reduction pairs). 

Definitions of DP approaches for functional programming also exist [32,33], 
which consider applicative systems with ML-style polymorphism. These works 
also employ a much broader, semantic definition than PFP, which is actually 
more general than the syntactic restriction we propose here. However, like the 
static approaches for term rewriting, they do not truly exploit the computability 
[47] properties inherent in this restriction: it is only used for the initial generation 
of dependency pairs. In the present work, we will take advantage of our exact 
computability notion by introducing a computable flag that can be used by 
the computable subterm criterion processor (Theorem 63) to handle benchmark 
systems that would otherwise be beyond the reach of static DPs. Also in these 
works, formative rules, meta-variable conditions and non-termination are not 
considered. 

Regarding dynamic DP approaches, a precursor of the present work is [31], 
which provides a halfway framework (methodology to prove “chain-freeness” ) 
for dynamic DPs, introduces a notion of formative rules, and briefly translates a 
basic form of static DPs to the same setting. Our formative reductions consider 
the shape of reductions rather than the rules they use, and they can be used as 
a flag in the framework to gain additional power in other processors. The adap- 
tation of static DPs in [31] was very limited, and did not for instance consider 
strictly positive inductive types or rules of functional type. 

For a more elaborate discussion of both static and dynamic DP approaches 
in the literature, we refer to [31] and the second author’s PhD thesis [29]. 


Organisation of the Paper. Section 2 introduces higher-order rewriting using 
AFSMs and recapitulates computability. In Sect. 3 we impose restrictions on 
the input AFSMs for which our framework is soundly applicable. In Sect. 4 we 
define static DPs for AFSMs, and derive the key results on them. Section 5 
formulates the DP framework and a number of DP processors for existing and 
new termination proving techniques. Section 6 concludes. Detailed proofs for all 
results in this paper and an experimental evaluation are available in a technical 
report [17]. In addition, many of the results have been informally published in 
the second author’s PhD thesis [29]. 


2 Preliminaries 


In this section, we first define our notation by introducing the AFSM formalism. 
Although not one of the standards of higher-order rewriting, AFSMs combine 
features from various forms of higher-order rewriting and can be seen as a form 
of IDTSs [5] which includes application. We will finish with a definition of com- 
putability, a technique often used for higher-order termination methods. 
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2.1 Higher-Order Term Rewriting Using AFSMs 


Unlike first-order term rewriting, there is no single, unified approach to higher- 
order term rewriting, but rather a number of similar but not fully compatible 
systems aiming to combine term rewriting and typed A-calculi. For generality, 
we will use Algebraic Functional Systems with Meta-variables: a formalism which 
admits translations from the main formats of higher-order term rewriting. 


Definition 1 (Simple types). We fiz a set S of sorts. All sorts are simple 
types, and if o,T are simple types, then so is o > T. 


We let — be right-associative. Note that all types have a unique representa- 
tion in the form 0, > ... > Om > t with LE S. 


Definition 2 (Terms and meta-terms). We fix disjoint sets F of function 
symbols, V of variables and M of meta-variables, each symbol equipped with 
a type. Each meta-variable is additionally equipped with a natural number. We 
assume that both V and M contain infinitely many symbols of all types. The set 
T(F,V) of terms over F,V consists of expressions s where s : o can be derived 
for some type o by the following clauses: 

(V)aroife:c€V (@)st:r ifs:o —rT andt:a 

(F) f:oiff:o cF (N)rAus:0 FT ife:c€Vands:T 
Meta-terms are expressions whose type can be derived by those clauses and: 

(M) Zls1,..., Sk) : k41 ++» A Om > b 

if Z: (01>... >k >... >m L, k) E€ M and s1 : 01,...,8k : Ok 

The A binds variables as in the A-calculus; unbound variables are called free, and 
FV (s) is the set of free variables in s. Meta-variables cannot be bound; we write 
FMV (s) for the set of meta-variables occurring in s. A meta-term s is called 
closed if FV(s) = Ø (even if FMV (s) # 0). Meta-terms are considered modulo 
a-conversion. Application (@) is left-associative; abstractions (A) extend as far 
to the right as possible. A meta-term s has type o if s : ø; it has base type if 
a € S. We define head(s) = head(s,) if s = sı s2, and head(s) = s otherwise. 

A (meta-)term s has a sub-(meta-)term t, notation s © t, if either s = t or 
s © t, where s © t if (a) s = àx.s' and s' © t, (b) s = sı s2 and s2 © t or (c) 
s = sı S2 and sı © t. A (meta-)term s has a fully applied sub-(meta-)term t, 
notation s ®t, if either s = t or s » t, where s » t if (a) s = àx.s' and s' mt, 
(b) s = sı s2 and sz > t or (c) s = sı S2 and sı > t (so ifs = £ sı 82, then x 
and x sı are not fully applied subterms, but s and both sı and s are). 

For Z : (o,k) € M, we call k the arity of Z, notation arity( Z). 


Clearly, all fully applied subterms are subterms, but not all subterms are 
fully applied. Every term s has a form t s,---s, with n > 0 and t = head(s) a 
variable, function symbol, or abstraction; in meta-terms t may also be a meta- 
variable application F'(s,,...,5,). Terms are the objects that we will rewrite; 
meta-terms are used to define rewrite rules. Note that all our terms (and meta- 
terms) are, by definition, well-typed. For rewriting, we will employ patterns: 
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Definition 3 (Patterns). A meta-term is a pattern if it has one of the forms 
Z(z1,..., £k) with all x; distinct variables; Ax. with x E€ V and £ a pattern; or 
a li- ln witha E FUY and all l; patterns (n > 0). 


In rewrite rules, we will use meta-variables for matching and variables 
only with binders. In terms, variables can occur both free and bound, and 
meta-variables cannot occur. Meta-variables originate in very early forms of 
higher-order rewriting (e.g., [1,27]), but have also been used in later formalisms 
(e.g., [8]). They strike a balance between matching modulo 8 and syntactic 
matching. By using meta-variables, we obtain the same expressive power as 
with Miller patterns [37], but do so without including a reversed G-reduction as 
part of matching. 


Notational Conventions: We will use x,y,z for variables, X,Y,Z for meta- 
variables, b for symbols that could be variables or meta-variables, f, g,h or more 
suggestive notation for function symbols, and s,t,u,v,g,w for (meta-)terms. 
Types are denoted 0,7, and u,« are sorts. We will regularly overload notation 
and write x € V, f € F or Z € M without stating a type (or minimal arity). 
For meta-terms Z() we will usually omit the brackets, writing just Z. 


Definition 4 (Substitution). A meta-substitution is a type-preserving func- 
tion y from variables and meta-variables to meta-terms. Let the domain of y 
be given by: dom(y) = {(x : o) € V | y(x) # r} U {(Z : (a,k)) EM | 
(Z) Æ ày.. -Yyk-Z(Y1,.--- Yk) }; this domain is allowed to be infinite. We let 
[bi := 81,...,0n := Sn] denote the meta-substitution y with y(bi) = si and 
qlz) = z for (2:0) E€ V \ {b1,..., bn}, and y(Z) = Ay... yn-Z(y1,---, YR) for 
(Z : (0, k)) E M\ {b1,..., bn}. We assume there are infinitely many variables x 
of all types such that (a) x € dom(y) and (b) for all b € dom(y): x ¢ FV (q(b)). 
A substitution is a meta-substitution mapping everything in its domain to 
terms. The result sy of applying a meta-substitution y to a term s is obtained by: 
ay= (a) ifeeV — (s t)y= (sy) (ty) 
fy=f iffEF (Azs)y=Az.(sy) ify(t)=2AZ € Uyeaomy FVO) 


For meta-terms, the result sy is obtained by the clauses above and: 
Z(815+++58k)Y = VW(Z)(817,-+-55n7) if Z € dom(y) 


Z(81,--- a YZS,- - -35K Y) if Z € dom(7) 
(Avy... 0K-8) (tiy tk = sler = t,..., Eki = tg] 
(Azı... En-S) Cti, ... tk} = sla := t1; ..., En := tn] tn41'tttk ifn<k 


and s is not an abstraction 


Note that for fixed k, any term has exactly one of the two forms above 
(Azı... n.s with n < k and s not an abstraction, or Ar)... £p.-S). 


Essentially, applying a meta-substitution that has meta-variables in its 
domain combines a substitution with (possibly several) 8-steps. For exam- 
ple, we have that: deriv (Az.sin (F(x)))[F := Ay.plus y a] equals 
deriv (Az.sin (plus z x)). We also have: X(0,nil)[X := Ax.map (Ay.x)] equals 
map (Ay.0) nil. 
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Definition 5 (Rules and rewriting). Let F,V,M be fixed sets of function 
symbols, variables and meta-variables respectively. A rule is a pair € => r of 
closed meta-terms of the same type such that £ is a pattern of the form f 01 --- ey 
with £ € F and FMV (r) C FMV (2). A set of rules R defines a rewrite relation 
=r as the smallest monotonic relation on terms which includes: 

(Rule) £8 >R rô ifl=rER and dom(d) = FMV (2) 

(Beta) (Ax.s)t >r s|x:= t] 

We say s =g t if s >r t is derived using a (Beta) step. A term s is terminating 
under =r if there is no infinite reduction s = S0 >R S1 >R..., is in normal 
form if there is no t such that s =r t, and is B-normal if there is no t with 
s =>gt. Note that we are allowed to reduce at any position of a term, even below 
a à. The relation >r is terminating if all terms over F, V are terminating. The 
set D C F of defined symbols consists of those (f£ : o) € F such that a rule 
f li- -- ln => r exists; all other symbols are called constructors. 


Note that R is allowed to be infinite, which is useful for instance to model 
polymorphic systems. Also, right-hand sides of rules do not have to be in 8- 
normal form. While this is rarely used in practical examples, non-(-normal rules 
may arise through transformations, and we lose nothing by allowing them. 


Example 6. Let F D {0 : nat, s:nat > nat, nil: list,cons: nat — list > 
list, map: (nat — nat) — list — list} and consider the following rules R: 


map (Av.Z(x)) nil > nil 
map (Axv.Z(x)) (cons H T) > cons Z(H) (map (Av.Z(x)) T) 


Then map (Ay.0) (cons (s 0) nil) =r cons 0 (map (Ay.0) nil) >r cons 0 nil. 
Note that the bound variable y does not need to occur in the body of Ay.0 to 
match Ax.Z(x). However, a term like map s (cons 0 nil) cannot be reduced, 
because s does not instantiate Ax.Z(a). We could alternatively consider the 


rules: 
map Z nil > nil 


map Z (cons H T) > cons (Z H) (map Z T) 


Where the system before had (Z : (nat — nat,1)) € M, here we 
assume (Z : (nat — nat,0)) € M. Thus, rather than meta-variable appli- 
cation Z(H) we use explicit application Z H. Then map s (cons 0 nil) >r 
cons (s 0) (map s nil). However, we will often need explicit 8-reductions; e.g., 
map (Ay.0) (cons (s 0) nil) =r cons ((Ay.0) (s 0)) (map (Ay.0) nil) +, 
cons 0 (map (Ay.0) nil). 


Definition 7 (AFSM). An AFSM is a tuple (F,V,M,R) of a signature and 
a set of rules built from meta-terms over F,V,M; as types of relevant variables 
and meta-vartables can always be derived from context, we will typically just refer 
to the AFSM (F,R). An AFSM implicitly defines the abstract reduction system 
(T(F, V), =r): a set of terms and a rewrite relation on this set. An AFSM is 
terminating if >r is terminating (on all terms in T(F,YV)). 
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Discussion: The two most common formalisms in termination analysis of higher- 
order rewriting are algebraic functional systems [26] (AFSs) and higher-order 
rewriting systems [37,39] (HRSs). AFSs are very similar to our AFSMs, but 
use variables for matching rather than meta-variables; this is trivially translated 
to the AFSM format, giving rules where all meta-variables have arity 0, like 
the “alternative” rules in Example 6. HRSs use matching modulo (/n, but the 
common restriction of pattern HRSs can be directly translated into AFSMs, 
provided terms are -normalised after every reduction step. Even without this 
G-normalisation step, termination of the obtained AFSM implies termination of 
the original HRS; for second-order systems, termination is equivalent. AFSMs 
can also naturally encode CRSs [27] and several applicative systems (cf. [29, 
Chapter 3J). 


Example 8 (Ordinal recursion). A running example is the AFSM (F,R) with 
F D {0: ord, s : ord — ord,lim: (nat — ord) — ord, rec : ord > nat > 
(ord —> nat — nat) — ((nat — ord) — (nat — nat) — nat) — nat} and R 
given below. As all meta-variables have arity 0, this can be seen as an AFS. 


recOkK FGSk 
rec(sX)K FG=>F X (rec X K FG) 
rec (lin H) K F G>G H (Amrec (H m) K F G) 


Observant readers may notice that by the given constructors, the type nat in 
Example 8 is not inhabited. However, as the given symbols are only a subset of F, 
additional symbols (such as constructors for the nat type) may be included. The 
presence of additional function symbols does not affect termination of AFSMs: 


Theorem 9 (Invariance of termination under signature extensions). 
For an AFSM (F,R) with F at most countably infinite, let funs(R) C F be 
the set of function symbols occurring in some rule of R. Then (T (F, V), =r) is 
terminating if and only if (T (funs(R), V), =r) is terminating. 


Proof. Trivial by replacing all function symbols in F \ funs(R) by corresponding 
variables of the same type. 


Therefore, we will typically only state the types of symbols occurring in the 
rules, but may safely assume that infinitely many symbols of all types are present 
(which for instance allows us to select unused constructors in some proofs). 


2.2 Computability 


A common technique in higher-order termination is Tait and Girard’s com- 
putability notion [47]. There are several ways to define computability predicates; 
here we follow, e.g., [5,7-9] in considering accessible meta-terms using strictly 
positive inductive types. The definition presented below is adapted from these 
works, both to account for the altered formalism and to introduce (and obtain 
termination of) a relation cç that we will use in the “computable subterm cri- 
terion processor” of Theorem 63 (a termination criterion that allows us to handle 
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systems that would otherwise be beyond the reach of static DPs). This allows 
for a minimal presentation that avoids the use of ordinals that would otherwise 
be needed to obtain >ç (see, e.g., [7,9]). 

To define computability, we use the notion of an RC-set: 


Definition 10. A set of reducibility candidates, or RC-set, for a rewrite rela- 
tion =r of an AFSM is a set I of base-type terms s such that: every term in I 
is terminating under =p; I is closed under =r (so if s E€ I ands =r t then 
tel); ifs =£ 81---s, with z E€ V or s = (Av.u) So:::Sn with n > 0, and for 
allt with s >r t we have t € I, then s € I (for any u, 80,---;8n E T(F, V)). 

We define I-computability for an RC-set I by induction on types. For s € 
T(F,V), we say that s is I-computable if either s is of base type and s € I; or 
s: — rT and for allt: o that are I-computable, s t is I-computable. 


The traditional notion of computability is obtained by taking for J the set of 
all terminating base-type terms. Then, a term s is computable if and only if (a) 
s has base type and is terminating; or (b) s : o — 7 and for all computable t : o 
the term s t is computable. This choice is simple but, for reasoning, not ideal: 
we do not have a property like: “if f s,---s, is computable then so is each s;”. 
Such a property would be valuable to have for generalising termination proofs 
from first-order to higher-order rewriting, as it allows us to use computability 
where the first-order proof uses termination. While it is not possible to define 
a computability notion with this property alongside case (b) (as such a notion 
would not be well-founded), we can come close to this property by choosing 
a different set for I. To define this set, we will use the notion of accessible 
arguments, which is used for the same purpose also in the General Schema [8], 
the Computability Path Ordering [9], and the Computability Closure [7]. 


Definition 11 (Accessible arguments). We fiz a quasi-ordering =$ on S 
with well-founded strict part =S := >S \ <S ! Fora type 0 = 01>... >0m> K 
(with k € S) and sort ı, leti Z$ o ifi =S k and ı >Ê c; for all i, and let 
pine o if t =S k andı =$ o; for all i.? 

For f : c ee Om LEF, let Acc(f) = {i| 1 <i <mn = oj}. 
For 2:01 > ... > om > l E V, let Acc(x) = {i | 1 <i< mao; has the form 
Ti... ™ >k witht >$ K}. We write s Dace t if either s = t, or s = dx.s' 
and 5 Èacct, or S =4 81°++S, witha E€ FUV and si acct for some i € Acc(a) 
with a ¢ FV(s;). 


With this definition, we will be able to define a set C' such that, roughly, s 
is C-computable if and only if (a) s : 0 > 7 and s t is C-computable for all C- 
computable t, or (b) s has base type, is terminating, and if s = f s1 -+ Sm then 
si is C-computable for all accessible i (see Theorem 13 below). The reason that 
Acc(x) for x € V is different is proof-technical: computability of Aw. S1- Sm 


1 Well-foundedness is immediate if S is finite, but we have not imposed that require- 
ment. 


2 Here ¿ >Ê o corresponds to “ 


occurs only positively in o” in [5,8,9]. 
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implies the computability of more arguments s; than computability of f s1 --- Sm 
does, since x can be instantiated by anything. 


Example 12. Consider a quasi-ordering +5 such that ord >Ù nat. In Example 8, 
we then have ord =S nat — ord. Thus, 1 € Acc(lim), which gives lim HO... H. 


Theorem 13. Let (F,R) be an AFSM. Let f s1---Sm Ər Si tice- tn if both 
sides have base type, i € Acc(f), and all t; are I-computable. There is an RC- 
set C such that C = {s € T(F,V) | s has base type ^ s is terminating under 
>r U 0 ^ if s >h f s1: Sm then s; is C-computable for all i € Acc(f)}. 


Proof (sketch). Note that we cannot define C as this set, as the set relies on 

the notion of C-computability. However, we can define C as the fixpoint of a 

monotone function operating on RC-sets. This follows the proof in, e.g., [8,9]. 
The complete proof is available in [17, Appendix A]. 


3 Restrictions 


The termination methodology in this paper is restricted to AFSMs that satisfy 
certain limitations: they must be properly applied (a restriction on the number 
of terms each function symbol is applied to) and accessible function passing (a 
restriction on the positions of variables of a functional type in the left-hand sides 
of rules). Both are syntactic restrictions that are easily checked by a computer 
(mostly; the latter requires a search for a sort ordering, but this is typically 
easy). 


3.1 Properly Applied AFSMs 


In properly applied AFSMs, function symbols are assigned a certain, minimal 
number of arguments that they must always be applied to. 


Definition 14. An AFSM (F,R) is properly applied if for every £ € D there 
exists an integer k such that for all rules l => r € R: (1) ifl =f 4 -- -ln then 
n = k; and (2) ifr >f ry---Tn thenn>k. We denote minar(f) = k. 


That is, every occurrence of a function symbol in the right-hand side of a rule 
has at least as many arguments as the occurrences in the left-hand sides of rules. 
This means that partially applied functions are often not allowed: an AFSM with 
rules such as double X => plus X X and doublelist L = map double L is not 
properly applied, because double is applied to one argument in the left-hand 
side of some rule, and to zero in the right-hand side of another. 

This restriction is not as severe as it may initially seem since partial 
applications can be replaced by A-abstractions; e.g., the rules above can be 
made properly applied by replacing the second rule by: doublelist L => 
map (Az.double x) L. By using 7-expansion, we can transform any AFSM to 
satisfy this restriction: 
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Definition 15 (R'). Given a set of rules R, let their n-expansion be given by 
R? ={(€ Zi- Zm)” > (r Zi Zm)" l>r ER with ri 01 01. Om > 
L, LE S, and Z,,...,Zm fresh meta-variables}, where 


- s[= Aw, ...Lm.3 (X1}") +++ (Lmt") if s is an application or element of VUF, 
and s}"= 5 otherwise; 
- f = f forf € F and T = x for x € V, while Z(s1,...,5k) = Z(5I,..., 5k) 


and (Ax.s) = àx.(s1”) and 51 82 = 51 (s2f”). 


Note that £1” is a pattern if £ is. By [29, Thm. 2.16], a relation =r is 
terminating if >pr1 is terminating, which allows us to transpose any methods to 
prove termination of properly applied AFSMs to all AFSMs. 

However, there is a caveat: this transformation can introduce non-termination 
in some special cases, e.g., the terminating rule f X > g f with f : o — o and 
g : (o —> o) — o, whose 7-expansion f X => g (Ax.(f x)) is non-terminating. 
Thus, for a properly applied AFSM the methods in this paper apply directly. 
For an AFSM that is not properly applied, we can use the methods to prove 
termination (but not non-termination) by first 7-expanding the rules. Of course, 
if this analysis leads to a counterexample for termination, we may still be able 
to verify whether this counterexample applies in the original, untransformed 
AFSM. 


Example 16. Both AFSMs in Example 6 and the AFSM in Example 8 are prop- 
erly applied. 


Example 17. Consider an AFSM (F,R) with F D {sin,cos : real — 
real,times : real — real — real, deriv: (real — real) — real — real} 
and R = {deriv (Az.sin F'(x)) > Ay.times (deriv (Ax.F(x}) y) (cos F(y))}. 
Although the one rule has a functional output type (real — real), this AFSM is 
properly applied, with deriv having always at least 1 argument. Therefore, we do 
not need to use R!. However, if R were to additionally include some rules that did 
not satisfy the restriction (such as the double and doublelist rules above), then 
n-expanding all rules, including this one, would be necessary. We have: R? = 
{deriv (Ax.sin F(x)) Y => (Ay.times (deriv (Az.F(x)) y) (cos F(y))) Y}. 
Note that the right-hand side of the 7-expanded deriv rule is not G-normal. 


3.2 Accessible Function Passing AFSMs 


In accessible function passing AFSMs, variables of functional type may not occur 
at arbitrary places in the left-hand sides of rules: their positions are restricted 
using the sort ordering =S and accessibility relation &,-- from Definition 11. 


Definition 18 (Accessible function passing). An AFSM (F,R) is accessi- 
ble function passing (AFP) if there exists a sort ordering =Ù following Definition 
11 such that: for all £ 41- - -ln > r E€ R and all Z € FMV (r): there are variables 
X1,---,L~ and some i such that li Pace Z (£1,..., 2k). 
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The key idea of this definition is that computability of each 4; implies com- 
putability of all meta-variables in r. This excludes cases like Example 20 below. 
Many common examples satisfy this restriction, including those we saw before: 


Example 19. Both systems from Example 6 are AFP: choosing the sort order- 
ing >> that equates nat and list, we indeed have cons H T Èacc H and 
cons H T Pace T (as Acc(cons) = {1,2}) and both Ar.Z (x) Pace Z(x) and 
Z Pace Z. The AFSM from Example 8 is AFP because we can choose ord >$ 
nat and have lim H Èacc H following Example 12 (and also s X Pace X 
and K Pace K, F Pace F, G Pace G). The AFSM from Example 17 is AFP, 
because Av.sin F(x) Pace F(a) for any =: Ax.sin F(x) Dace F(x) because 
sin F(x) Pace F(x) because 1 € Acc(sin). 


In fact, all first-order AFSMs (where all fully applied sub-meta-terms of the 
left-hand side of a rule have base type) are AFP via the sort ordering >° that 
equates all sorts. Also (with the same sort ordering), an AFSM (F,R) is AFP if, 
for all rules f 41 ---&, >r E€ Randall 1 < i< k, we can write: 4; = A171... En; Z 
where n; > 0 and all fully applied sub-meta-terms of / have base type. 

This covers many practical systems, although for Example 8 we need a non- 
trivial sort ordering. Also, there are AFSMs that cannot be handled with any =°. 


Example 20 (Encoding the untyped A-calculus). Consider an AFSM with F D> 
{ap : o > o > o, Im: (o > 0) — o} and R = {ap (1m F) > F} (note that 
the only rule has type o — o). This AFSM is not accessible function passing, 
because 1m F Dace F cannot hold for any >° (as this would require o =s o). 

Note that this example is also not terminating. With t = 1m (Az.ap x x), we 
get this self-loop as evidence: apt t >r (Av.ap x x) t >, ap tt. 


Intuitively: in an accessible function passing AFSM, meta-variables of a 
higher type may occur only in “safe” places in the left-hand sides of rules. Rules 
like the ones in Example 20, where a higher-order meta-variable is lifted out of 
a base-type term, are not admitted (unless the base type is greater than the 
higher type). 


In the remainder of this paper, we will refer to a properly applied, accessible 
function passing AFSM as a PA-AFP AFSM. 


Discussion: This definition is strictly more liberal than the notions of “plain 
function passing” in both [34] and [46] as adapted to AFSMs. The notion in 
[46] largely corresponds to AFP if +° equates all sorts, and the HRS formalism 
guarantees that rules are properly applied (in fact, all fully applied sub-meta- 
terms of both left- and right-hand sides of rules have base type). The notion 
in [34] is more restrictive. The current restriction of PA-AFP AFSMs lets us 
handle examples like ordinal recursion (Example 8) which are not covered by 
[34,46]. However, note that [34,46] consider a different formalism, which does 
take rules whose left-hand side is not a pattern into account (which we do not 
consider). Our restriction also quite resembles the “admissible” rules in [6] which 
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are defined using a pattern computability closure [5], but that work carries addi- 
tional restrictions. 

In later work [32,33], Kusakari extends the static DP approach to forms of 
polymorphic functional programming, with a very liberal restriction: the defi- 
nition is parametrised with an arbitrary RC-set and corresponding accessibility 
(“safety”) notion. Our AFP restriction is actually an instance of this condition 
(although a more liberal one than the example RC-set used in [32,33]). We have 
chosen a specific instance because it allows us to use dedicated techniques for 
the RC-set; for example, our computable subterm criterion processor (Theorem 
63). 


4 Static Higher-Order Dependency Pairs 


To obtain sufficient criteria for both termination and non-termination of AFSMs, 
we will now transpose the definition of static dependency pairs [6,33, 34,46] to 
AFSMs. In addition, we will add the new features of meta-variable conditions, 
formative reductions, and computable chains. Complete versions of all proof 
sketches in this section are available in [17, Appendix B]. 

Although we retain the first-order terminology of dependency pairs, the set- 
ting with meta-variables makes it more suitable to define DPs as triples. 


Definition 21 ((Static) Dependency Pair). A dependency pair (DP) is a 
triple L> p (A), where £ is a closed pattern f 4 --- lp, p is a closed meta-term 
g Pı: Pn, and A is a set of meta-variable conditions: pairs Z : i indicating that 
Z regards its i argument. A DP is conservative if FMV (p) C FMV (8). 

A substitution y respects a set of meta-variable conditions A if for all Z : i in 
A we have y(Z) = Axı ...2;.t with either i > j, ori < j and z; E€ FV (t). DPs 
will be used only with substitutions that respect their meta-variable conditions. 

For €= p (0) (so a DP whose set of meta-variable conditions is empty), we 
often omit the third component and just write € => p. 


Like the first-order setting, the static DP approach employs marked function 
symbols to obtain meta-terms whose instances cannot be reduced at the root. 


Definition 22 (Marked symbols). Let (F,R) be an AFSM. Define F? := 
Fw{tt:o|f:0 eD}. For a meta-term s = f sı- sp with £ € D and 
k = minar(£f), we let së = fË s,---s,; for s of other forms s* is not defined. 


Moreover, we will consider candidates. In the first-order setting, candidate 
terms are subterms of the right-hand sides of rules whose root symbol is a defined 
symbol. Intuitively, these subterms correspond to function calls. In the current 
setting, we have to consider also meta-variables as well as rules whose right-hand 
side is not -normal (which might arise for instance due to 7-expansion). 


Definition 23 (G-reduced-sub-meta-term, >,, &4). A meta-term s has a 
fully applied B-reduced-sub-meta-term t (shortly, BRSMT), notation s Dg t, if 
there exists a set of meta-variable conditions A with s> 4t. Here s© 4t holds if: 
—s=t, or 

- s = àx.u and u © 4 t, or 
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- §=(Az.U) 59°++ Sn and some si at, or ulx := So] $1°++ Sn Bat, or 

- S =Q 81':-S, Witha E FUV and some s; © 4 t, or 

- s = Z(ti,... tk} $1°++Sn and some si © 4t, or 

- s = Z(ti,... tk) S1 Sn and tiÈ 4t for some i € {1,...,k} with (Z : i) € A. 


Essentially, s © 4 t means that t can be reached from s by taking 8-reductions 
at the root and “subterm”-steps, where Z : 7 is in A whenever we pass into 
argument i of a meta-variable Z. BRSMTs are used to generate candidates: 


Definition 24 (Candidates). For a meta-term s, the set cand(s) of candi- 
dates of s consists of those pairs t (A) such that (a) t has the form f s1--: sx 
with f € D and k = minar(f), and (b) there are Sk+1,...,Sn (with n > k) such 
that s Pat Sk41`'' Sn, and (c) A is minimal: there is no subset A’ Ç A with 
s Bar t. 


Example 25. In AFSMs where all meta-variables have arity 0 and the right- 
hand sides of rules are G-normal, the set cand(s) for a meta-term s consists 
exactly of the pairs t (Ø) where t has the form f s1--- Sminar(t) and t occurs as 
part of s. In Example 8, we thus have cand(G H (Am.rec (H m) K F G)) = 
{rec (H m) K F G (6) }. 


If some of the meta-variables do take arguments, then the meta-variable 
conditions matter: candidates of s are pairs t (A) where A contains exactly 
those pairs Z : i for which we pass through the i* argument of Z to reach t in s. 


Example 26. Consider an AFSM with the signature from Example 8 but a rule 
using meta-variables with larger arities: 


rec (lim (An.H(n))) K (Av.An.F(a,n)) (Af.Ag-G(f,9)) > 
G(An.H(n), Am.rec H(m) K (Ax.An.F (a, n)) (Af.Ag.G(f, g))) 


The right-hand side has one candidate: 
rec H(m) K (Av.An.F(a,n)) (Af.Ag-G(f,g)) GG : 2}) 


The original static approaches define DPs as pairs # > pë where £ > r is a 
rule and p a subterm of r of the form f r1---r,, — as their rules are built using 
terms, not meta-terms. This can set variables bound in r free in p. In the current 
setting, we use candidates with their meta-variable conditions and implicit 8- 
steps rather than subterms, and we replace such variables by meta-variables. 


Definition 27 (SDP). Let s be a meta-term and (F,R) be an AFSM. Let 
metafy(s) denote s with all free variables replaced by corresponding meta- 
variables. Now SDP(R) = {Ë => metafy(p*) (A) | L= r € RAp (A) € cand(r)}. 


Although static DPs always have a pleasant form f* ¢,---0, > 
g? pı- Pn (A) (as opposed to the dynamic DPs of, e.g., [31], whose right-hand 
sides can have a meta-variable at the head, which complicates various techniques 


A Static Higher-Order Dependency Pair Framework 765 


in the framework), they have two important complications not present in first- 
order DPs: the right-hand side p of a DP £ > p (A) may contain meta-variables 
that do not occur in the left-hand side @— traditional analysis techniques are not 
really equipped for this — and the left- and right-hand sides may have different 
types. In Sect. 5 we will explore some methods to deal with these features. 


Example 28. For the non-y-expanded rules of Example 17, the set SDP(R) has 
one element: deriv’ (Av.sin F(x)) > deriv! (Ax.F(x)). (As times and cos are 
not defined symbols, they do not generate dependency pairs.) The set SDP(R') 
for the n-expanded rules is {deriv’ (Az.sin F(x)) Y > deriv’ (Ax.F(x)) Y}. 
To obtain the relevant candidate, we used the -reduction step of BRSMTs. 


Example 29. The AFSM from Example 8 is AFP following Example 19; here 
SDP(R) is: 


rect (s X) K F GS rec! X K FG (0) 
rect (lin H) K F G > rec! (H M) K FG (0) 


Note that the right-hand side of the second DP contains a meta-variable that is 
not on the left. As we will see in Example 64, that is not problematic here. 


Termination analysis using dependency pairs importantly considers the 
notion of a dependency chain. This notion is fairly similar to the first-order 
setting: 


Definition 30 (Dependency chain). Let P be a set of DPs and R a set of 
rules. A (finite or infinite) (P, R)-dependency chain (or just (P,R)-chain) is 
a sequence [(€9 = po (Ao), so, to), (41 > pı (A1), s1,t1),-..] where each 4; > 
pi (Ai) E P and all si, t; are terms, such that for all i: 


1. there exists a substitution y on domain FMV (4i) U FMV (p;) such that si = 
Liy, ti = piy and for all Z € dom(y): y(Z) respects A;; 
2. we can write ti = f u1 :--Un and Si+ =f wi-+- Wy and each uj >R Wj. 


Example 31. In the (first) AFSM from Example 6, we have SDP(R) = 
{map! (A\x.Z(x))(cons H T) > map? (Az.Z(x)) T}. An example of 
a finite dependency chain is [(p,s1,t1),(p,s2,t2)]| where p is the one 
DP, sı = map’ (Ax.s x) (cons O (cons (s 0) (map (Az.x) nil))) 
and tı = map’ (\z.s x) (cons (s 0) (map (Ag.x) nil)) and sy = 
map* (Azx.s x) (cons (s 0) nil) and tə = map* (Az.s x) nil. 

Note that here tı reduces to s2 in a single step (map (Av.x) nil >r nil). 


We have the following key result: 
Theorem 32. Let (F,R) be a PA-AFP AFSM. If (F,R) is non-terminating, 
then there is an infinite (SDP(R), R)-dependency chain. 


Proof (sketch). The proof is an adaptation of the one in [34], altered for the more 
permissive definition of accessible function passing over plain function passing 
as well as the meta-variable conditions; it also follows from Theorem 37 below. 
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By this result we can use dependency pairs to prove termination of a given 
properly applied and AFP AFSM: if we can prove that there is no infinite 
(SDP(R), R)-chain, then termination follows immediately. Note, however, that 
the reverse result does not hold: it is possible to have an infinite (SDP(R), R)- 
dependency chain even for a terminating PA-AFP AFSM. 


Example 33. Let F D {0,1 : nat, f : nat — nat, g: (nat — nat) — nat} and 
R = {f 0 > g (Ac z),g (Av. F(x)) > F(1)}. This AFSM is PA-AFP, with 
SDP(R) = {£f} 0 > gë (Ax.f x), ££ 0 > f} X}; the second rule does not cause the 
addition of any dependency pairs. Although > R is terminating, there is an infi- 
nite (SDP(R), R)-chain [(f# 0 > £# X, f? 0, £4 0), (f# 0 > £? X, £! 0, ff 0),...]. 


The problem in Example 33 is the non-conservative DP fË? 0 > fË X, 
with X on the right but not on the left. Such DPs arise from abstractions in 
the right-hand sides of rules. Unfortunately, abstractions are introduced by the 
restricted 7-expansion (Definition 15) that we may need to make an AFSM prop- 
erly applied. Even so, often all DPs are conservative, like Examples 6 and 17. 
There, we do have the inverse result: 


Theorem 34. For any AFSM (F,R): if there is an infinite (SDP(R), R)-chain 
[(po, So, to), (1, $1, t1),.-.] with all pi conservative, then =r is non-terminating. 


Proof (sketch). If FMV (pi) C FMV (£;), then we can see that s; >r : =} t; for 
some term t; of which t; is a subterm. Since also each t; >} si+1, the infinite 
chain induces an infinite reduction so >$ th >R 3, >} tl SR... 


The core of the dependency pair framework is to systematically simplify a set 
of pairs (P,R) to prove either absence or presence of an infinite (P, R)-chain, 
thus showing termination or non-termination as appropriate. By Theorems 32 
and 34 we can do so, although with some conditions on the non-termination 
result. We can do better by tracking certain properties of dependency chains. 


Definition 35 (Minimal and Computable chains). Let (F,U) be an AFSM 
and Cy an RC-set satisfying the properties of Theorem 13 for (F,U). Let F 
contain, for every type o, at least countably many symbols f : o not used in U. 

A (P,R)-chain [(po, So, to), (01, $1; 1),--.] is U-computable if: >u 2 >r, 
and for alli € N there exists a substitution yi such that pi = 4 > pi (Aj) with 
si = liyi and ti = piyi; and (Aq, ...2n.v)y¥; is Cy-computable for all v and B 
such that pi © p v, qi respects B, and FV (v) = {a1,...,2n}. 

A chain is minimal if the strict subterms of all t; are terminating under >r. 


In the first-order DP framework, minimal chains give access to several pow- 
erful techniques to prove absence of infinite chains, such as the subterm criterion 
[24] and usable rules [22,24]. Computable chains go a step further, by building 
on the computability inherent in the proof of Theorem 32 and the notion of 
accessible function passing AFSMs. In computable chains, we can require that 
(some of) the subterms of all t; are computable rather than merely terminating. 
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This property will be essential in the computable subterm criterion processor 
(Theorem 63). 

Another property of dependency chains is the use of formative rules, which 
has proven very useful for dynamic DPs [31]. Here we go further and con- 
sider formative reductions, which were introduced for the first-order DP frame- 
work in [16]. This property will be essential in the formative rules processor 
(Theorem 58). 


Definition 36 (Formative chain, formative reduction). A (P, R)-chain 
(Z0 = po (Ao), so, to), (41 > pi (41), $1, ¢1),-.-] is formative if for all i, the 
reduction ti +R Si41 is €;41-formative. Here, for a pattern £, substitution y and 
term s, a reduction s > ty is ¢-formative if one of the following holds: 


— £ is not a fully extended linear pattern; that is: some meta-variable occurs 
more than once in £ or £ has a sub-meta-term rx.C|Z(s)] with x ¢ {s} 

- l is a meta-variable application Z(a1,...,2,~) and s = ty 

- §=4 S1- Sn andl=a b- ln witha € FUV and each s; >k y by an 
Li-formative reduction 

- s = \àx.s' and L= àz.l' and s' > l'y by an l'-formative reduction 

- s = (Av.u) v wi -Wn and ulx := v] wi: wn SR Ly by an L-formative 
reduction 

- l is not a meta-variable application, and there are l => r' € R, meta-variables 
Zı...Zn (n > 0) and 6 such that s >} (V Z1-+-Zy)d by an (V Zi- Zn)- 
formative reduction, and (r' Z1: Zn) >R ly by an L-formative reduction. 


The idea of a formative reduction is to avoid redundant steps: if s >} 
fy by an formative reduction, then this reduction takes only the steps 
needed to obtain an instance of /. Suppose that we have rules plus 0 Y > 
Y, plus (s X) Y > s (plus X Y). Let L := g 0 X and t := plus 0 0. Then the 
reduction g t t >r g O t is ¢formative: we must reduce the first argument to 
get an instance of £. The reduction g t t >r g t 0 =r g 0 0 is not 4-formative, 
because the reduction in the second argument does not contribute to the non- 
meta-variable positions of /. This matters when we consider £ as the left-hand 
side of a rule, say g 0 X > 0: if we reduce t t >r g t 0 >r g 0 0 =r 0, then 
the first step was redundant: removing this step gives a shorter reduction to the 
same result: g t t >r g O t >r O0. In an infinite reduction, redundant steps may 
also be postponed indefinitely. 


We can now strengthen the result of Theorem 32 with two new properties. 


Theorem 37. Let (F,R) be a properly applied, accessible function passing 
AFSM. If (F,R) is non-terminating, then there is an infinite R-computable 
formative (SDP(R), R)-dependency chain. 


Proof (sketch). We select a minimal non-computable (MNC) term s := f s1 -> Sk 
(where all s; are Cr-computable) and an infinite reduction starting in s. Then we 
stepwise build an infinite dependency chain, as follows. Since s is non-computable 
but each s; terminates (as computability implies termination), there exist a rule 
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f (,---& => r and substitution y such that each si >} fy and ry is non- 
computable. We can then identify a candidate t (A) of r such that y respects 
A and ty is a MNC subterm of ry; we continue the process with ty (or a term 
at its head). For the formative property, we note that if s >} ¢y and u is 
terminating, then u >} lô by an ¢-formative reduction for substitution 6 such 
that each 6(Z) > 7(Z). This follows by postponing those reduction steps not 
needed to obtain an instance of £. The resulting infinite chain is 7-computable 
because we can show, by induction on the definition of Èacc, that if £ > r 
is an AFP rule and 4y is a MNC term, then y(Z) is Cr-computable for all 
Z € FMV(r). 


As it is easily seen that all Cy-computable terms are => ,/-terminating and 
therefore =-terminating, every U-computable (P, R)-dependency chain is also 
minimal. The notions of ?-computable and formative chains still do not suffice 
to obtain a true inverse result, however (i.e., to prove that termination implies 
the absence of an infinite R-computable chain over SDP(R)): the infinite chain 
in Example 33 is R-computable. 


To see why the two restrictions that the AFSM must be properly applied and 
accessible function passing are necessary, consider the following examples. 


Example 38. Consider F D {fix : ((o — o) o — o) o — o} and R = 
{fix F X > F (fix F) X}. This AFSM is not properly applied; it is also 
not terminating, as can be seen by instantiating F with Ay.y. However, it does 
not have any static DPs, since fix F is not a candidate. Even if we altered the 
definition of static DPs to admit a dependency pair fix? F X > fix! F, this 
pair could not be used to build an infinite dependency chain. 

Note that the problem does not arise if we study the ņn-expanded rules R? = 
{fix F X > F (Az.fix F z) X}, as the dependency pair fix! F X > fix! F Z 
does admit an infinite chain. Unfortunately, as the one dependency pair does 
not satisfy the conditions of Theorem 34, we cannot use this to prove non- 
termination. 


Example 39. The AFSM from Example 20 is not accessible function passing, 
since Acc(1m) = Ø. This is good because the set SDP(R) is empty, which would 
lead us to falsely conclude termination without the restriction. 


Discussion: Theorem 37 transposes the work of [34,46] to AFSMs and extends 
it by using a more liberal restriction, by limiting interest to formative, R- 
computable chains, and by including meta-variable conditions. Both of these 
new properties of chains will support new termination techniques within the DP 
framework. 

The relationship with the works for functional programming [32,33] is less 
clear: they define a different form of chains suited well to polymorphic systems, 
but which requires more intricate reasoning for non-polymorphic systems, as 
DPs can be used for reductions at the head of a term. It is not clear whether 
there are non-polymorphic systems that can be handled with one and not the 
other. The notions of formative and R-computable chains are not considered 
there; meta-variable conditions are not relevant to their \-free formalism. 
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5 The Static Higher-Order DP Framework 


In first-order term rewriting, the DP framework [20] is an extendable framework 
to prove termination and non-termination. As observed in the introduction, DP 
analyses in higher-order rewriting typically go beyond the initial DP approach 
[2], but fall short of the full framework. Here, we define the latter for static DPs. 
Complete versions of all proof sketches in this section are in [17, Appendix C]. 


We have now reduced the problem of termination to non-existence of certain 
chains. In the DP framework, we formalise this in the notion of a DP problem: 


Definition 40 (DP problem). A DP problem is a tuple (P,R,m, f) with P 
a set of DPs, R a set of rules, m E€ {minimal, arbitrary} U {computable,, | 
any set of rules U}, and f € {formative, all}.% 

A DP problem (P,R,m, f) is finite if there exists no infinite (P,R)-chain 
that is U-computable if m = computable,,, is minimal if m = minimal, and is 
formative if f = formative. It is infinite if R is non-terminating, or if there 
exists an infinite (P,R)-chain where all DPs used in the chain are conservative. 

To capture the levels of permissiveness in the m flag, we use a transitive- 
reflexive relation = generated by computable,, = minimal > arbitrary. 


Thus, the combination of Theorems 34 and 37 can be rephrased as: 
an AFSM (F,R) is terminating if (SDP(R),R, computable, formative) is 
finite, and is non-terminating if (SDP(R),R,m, f) is infinite for some m € 
{computable,,,minimal, arbitrary} and f € {formative, all}.4 


The core idea of the DP framework is to iteratively simplify a set of DP 
problems via processors until nothing remains to be proved: 


Definition 41 (Processor). A dependency pair processor (or just processor ) 
is a function that takes a DP problem and returns either NO or a set of DP 
problems. A processor Proc is sound if a DP problem M is finite whenever 
Proc(M) # NO and all elements of Proc(M) are finite. A processor Proc is 
complete if a DP problem M is infinite whenever Proc(M) = NO or contains an 
infinite element. 


To prove finiteness of a DP problem M with the DP framework, we proceed 
analogously to the first-order DP framework [22]: we repeatedly apply sound DP 
processors starting from M until none remain. That is, we execute the following 
rough procedure: (1) let A := {M}; (2) while A # 9: select a problem Q € A and 
a sound processor Proc with Proc(Q) # NO, and let A := (A \ {Q}) U Proc(Q). 
If this procedure terminates, then M is a finite DP problem. 


3 Our framework is implicitly parametrised by the signature F’ used for term forma- 
tion. As none of the processors we present modify this component (as indeed there 
is no need to by Theorem 9), we leave it implicit. 

4 The processors in this paper do not alter the flag m, but some require minimality 
or computability. We include the minimal option and the subscript U for the sake of 
future generalisations, and for reuse of processors in the dynamic approach of [31]. 
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To prove termination of an AFSM (F,R), we would use as initial DP problem 
(SDP(R),R, computablez, formative), provided that R is properly applied 
and accessible function passing (where 7-expansion following Definition 15 may 
be applied first). If the procedure terminates — so finiteness of M is proved by 
the definition of soundness — then Theorem 37 provides termination of >R. 

Similarly, we can use the DP framework to prove infiniteness: (1) let A := 
{M}; (2) while A Æ NO: select a problem Q € A and a complete processor Proc, 
and let A := NO if Proc(Q) = NO, or A := (A \ {Q}) U Proc(Q) otherwise. For 
non-termination of (F,R), the initial DP problem should be (SDP(R), R, m, f), 
where m, f can be any flag (see Theorem 34). Note that the algorithms coin- 
cide while processors are used that are both sound and complete. In a tool, 
automation (or the user) must resolve the non-determinism and select suitable 
processors. 


Below, we will present a number of processors within the framework. We will 
typically present processors by writing “for a DP problem M satisfying X, Y, Z, 
Proc(M) =...”. In these cases, we let Proc(M) = {M} for any problem M not 
satisfying the given properties. Many more processors are possible, but we have 
chosen to present a selection which touches on all aspects of the DP framework: 


— processors which map a DP problem to NO (Theorem 65), a singleton set 
(most processors) and a non-singleton set (Theorem 42); 

— changing the set R (Theorems 54, 58) and various flags (Theorem 54); 

— using specific values of the f (Theorem 58) and m flags (Theorems 54, 61, 63); 

— using term orderings (Theorems 49, 52), a key part of many termination 
proofs. 


5.1 The Dependency Graph 


We can leverage reachability information to decompose DP problems. In first- 
order rewriting, a graph structure is used to track which DPs can possibly follow 
one another in a chain [2]. Here, we define this dependency graph as follows. 


Definition 42 (Dependency graph). A DP problem (P,R,m, f) induces a 
graph structure DG, called its dependency graph, whose nodes are the elements 
of P. There is a (directed) edge from pı to po in DG iff there exist sı, ty, 82, te 
such that [(p1, $1, t1), (p2, $2, t2)| is a (P, R)-chain with the properties for m, f. 


Example 43. Consider an AFSM with F D {f : (nat — nat) — nat — nat} and 
R = {f (Az. F(2)) (s Y) > F(£ (Ax.0) (£ (Av. F'(x)) Y))}. Let P := SDP(R) = 


T ft (Ax.F(2)) (s Y) 3 fË? (Ax.0) (£ (Ax.F(2)) Y) ({F : m) 
(2) £? (Ax.F(2)) (s Y) 3 f! (Ax.F(2)) Y ({F :1}) 


The dependency graph of (P,R,minimal, formative) is: 


(1) ap 
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There is no edge from (1) to itself or (2) because there is no substitution y 
such that (Ax.0)y can be reduced to a term (Axv.F'(x))d where (F) regards its 
first argument (as + cannot introduce new variables). 


In general, the dependency graph for a given DP problem is undecidable, 
which is why we consider approximations. 


Definition 44 (Dependency graph approximation [31]). A finite graph Go 
approximates DG if @ is a function that maps the nodes of DG to the nodes of 
Go such that, whenever DG has an edge from pi to p2, Go has an edge from 
6(pi) to 0(p2). (Go may have edges that have no corresponding edge in DG.) 


Note that this definition allows for an infinite graph to be approximated 
by a finite one; infinite graphs may occur if R is infinite (e.g., the union of all 
simply-typed instances of polymorphic rules). 

If P is finite, we can take a graph approximation Gia with the same nodes 
as DG. A simple approximation may have an edge from ¢; > pı (A1) to 2 > 
p2 (Az) whenever both pı and £2 have the form fË s,---s;, for the same f and 
k. However, one can also take the meta-variable conditions into account, as we 
did in Example 43. 


Theorem 45 (Dependency graph processor). The processor Procg, that 
maps a DP problem M = (P,R,m, f) to {({p E P | A(p) € Ci}, R,m, f) | 1 < 
i < n} if Go is an approximation of the dependency graph of M and C\,...,Cn 
are the (nodes of the) non-trivial strongly connected components (SCCs) of Go, 
is both sound and complete. 


Proof (sketch). In an infinite (P,R)-chain [(p9, so, to), (91, $1, t1),---], there is 
always a path from p; to pi+ı in DG. Since Ge is finite, every infinite path in 
DG eventually remains in a cycle in Gg. This cycle is part of an SCC. 


Example 46. Let R be the set of rules from Example 43 and G be the graph given 
there. Then Procg(SDP(R),R, computablep, formative) = {({f* (Ar. F(z)) 
(s Y) > fË (Av.F(x)) Y ({F: 1})}, R, computable, formative) }. 


Example 47. Let R consist of the rules for map from Example 6 along with f L > 
map (Av.g x) L and g X > X. Then SDP(R) = {(1) map! (Axv.Z(x)) (cons H T) 
> map! (\x.Z(x)) T, (2) £? L S map! (Az.g x) L, (3) £? L > gt X}. DP (3) 
is not conservative, but it is not on any cycle in the graph approximation Gig 
obtained by considering head symbols as described above: 


a) 


As (1) is the only DP on a cycle, Procspp,,,(SDP(R),R, computabler, 
formative) = { ({(1)}, R, computablep, formative) }. 
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Discussion: The dependency graph is a powerful tool for simplifying DP prob- 
lems, used since early versions of the DP approach [2]. Our notion of a depen- 
dency graph approximation, taken from [31], strictly generalises the original 
notion in [2], which uses a graph on the same node set as DG with possibly 
further edges. One can get this notion here by using a graph Gia. The advantage 
of our definition is that it ensures soundness of the dependency graph processor 
also for infinite sets of DPs. This overcomes a restriction in the literature [34, 
Corollary 5.13] to dependency graphs without non-cyclic infinite paths. 


5.2 Processors Based on Reduction Triples 


At the heart of most DP-based approaches to termination proving lie well- 
founded orderings to delete DPs (or rules). For this, we use reduction triples 
[24,31]. 


Definition 48 (Reduction triple). A reduction triple (=, =,=) consists of 
two quasi-orderings Z, and > and a well-founded strict ordering > on meta-terms 
such that = is monotonic, all of =,=,> are meta-stable (that is, £ X r implies 
ly = ry if l is a closed pattern and y a substitution on domain FMV (£) U 
FMV (r), and the same for = and >), =g C XZ, and both = o > C > and 
zo> C>. 


In the first-order DP framework, the reduction pair processor [20] seeks to 
orient all rules with = and all DPs with either = or >; if this succeeds, those 
pairs oriented with > may be removed. Using reduction triples rather than pairs, 
we obtain the following extension to the higher-order setting: 


Theorem 49 (Basic reduction triple processor). Let M = (Pı W 
P2,R,m, f) be a DP problem. If (=, =, >) is a reduction triple such that 


1. for all l =r E€ R, we have = r; 
2. for all L> p (A) € Pi, we have £ > p; 
3. for all L> p (A) € Po, we have L = p; 


then the processor that maps M to {(P2, R, m, f)} is both sound and complete. 


Proof (sketch). For an infinite (P1 © P2, R)-chain [(po, so, to), (P1, $1, t1), ---] the 
requirements provide that, for all i: (a) s; > ti if pi € Pi; (b) si = ti if pi € Po; 
and (c) ti = si+1. Since > is well-founded, only finitely many DPs can be in P4, 
so a tail of the chain is actually an infinite (P2, R, m, f)-chain. 


Example 50. Let (F,R) be the (non-7-expanded) rules from Example 17, and 
SDP(R) the DPs from Example 28. From Theorem 49, we get the following 
ordering requirements: 


deriv (Av.sin F(x)) = Ay.times (deriv (Av.F'(x)) y) (cos F(y)) 
deriv! (A\z.sin F(x)) > deriv! (Ar. F(2)) 
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We can handle both requirements by using a polynomial interpretation 7 to 
N [15,43], by choosing Jsin(n) = n + 1, Jeos(n) = 0, Stimes(M1,n2) = nı, 
Jaeriv(f) = Jaerivt( f) = An.f (n). Then the requirements are evaluated to: 
An. f(n) +1 > An.f(n) and An.f(n) +1 > An. f(n), which holds on N. 


Theorem 49 is not ideal since, by definition, the left- and right-hand side of 
a DP may have different types. Such DPs are hard to handle with traditional 
techniques such as HORPO [26] or polynomial interpretations [15,43], as these 
methods compare only (meta-)terms of the same type (modulo renaming of 
sorts). 


Example 51. Consider the toy AFSM with R= {f(s X) Y >g XY, g X > 
dz.f X z} and SDP(R) = {fË (s X) Y > gi X, gi X sf! X Z}. Iff andg 
both have a type nat — nat — nat, then in the first DP, the left-hand side has 
type nat while the right-hand side has type nat — nat. In the second DP, the 
left-hand side has type nat — nat and the right-hand side has type nat. 


To be able to handle examples like the one above, we adapt [31, Thm. 5.21] 
by altering the ordering requirements to have base type. 


Theorem 52 (Reduction triple processor). Let Bot be a set {lo : 0 | 
o a type} C FË of unused constructors, M = (Pı © Po,R,m, f) a DP prob- 
lem and (=, =,=) a reduction triple such that: (a) for all l = r € R, we have 
LX r; and (b) for all l > p (A) € Pi Y Po with L: 0, > ... > Om > | and 


DiTy >... —Tn `> k we have, for fresh meta-variables Zi : 01,..., Zm : Om: 


- l Zi Zm >p lal, fL p (A) EP 
- lL Zi- Zm ep La: ln, ifl p (A) EPa 


Then the processor that maps M to {(P2,R,m, f)} is both sound and complete. 


Proof (sketch). If (=, %,>) is such a triple, then for R € {>,>} define R’ 
as follows: for s : ci —... Om > Landt: q >... > Tmn — kK, let 
s R' tif for all wy : 01,...,Um : Om there exist w1 : T1,..., Wn : Tn Such that 
S u1: +-Um R t w1: Wn. Now apply Theorem 49 with the triple (=, =’, >’). 


Here, the elements of Bot take the role of minimal terms for the ordering. We 
use them to flatten the type of the right-hand sides of ordering requirements, 
which makes it easier to use traditional methods to generate a reduction triple. 

While > and = may still have to orient meta-terms of distinct types, these 
are always base types, which we could collapse to a single sort. The only relation 
required to be monotonic, %, regards pairs of meta-terms of the same type. This 
makes it feasible to apply orderings like HORPO or polynomial interpretations. 

Both the basic and non-basic reduction triple processor are difficult to use for 
non-conservative DPs, which generate ordering requirements whose right-hand 
side contains a meta-variable not occurring on the left. This is typically difficult 
for traditional techniques, although possible to overcome, by choosing triples 
that do not regard such meta-variables (e.g., via an argument filtering [35,46]): 
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Example 58. We apply Theorem 52 on the DP problem (SDP(R),R, 
computable,, formative) of Example 51. This gives for instance the following 
ordering requirements: 


f(ieX)¥oeXY fË(sX)Y > gl X laa 
gXrrAz£ Xz gxYrrtt x Z 


The right-hand side of the last DP uses a meta-variable Z that does not occur on 
the left. As neither > nor > are required to be monotonic (only 7% is), function 
symbols do not have to regard all their arguments. Thus, we can use a polynomial 
interpretation J to N with Ji = 0, %.(n) = n + 1 and A(ni,n2) = nı for 
h € {f,f*,g,g'}. The ordering requirements then translate to X + 1 > X and 
Ay.X > rz.X for the rules, and X +1 > X and X > X for the DPs. All 
these inequalities on N are clearly satisfied, so we can remove the first DP. The 
remaining problem is quickly dispersed with the dependency graph processor. 


5.3 Rule Removal Without Search for Orderings 


While processors often simplify only P, they can also simplify R. One of the 
most powerful techniques in first-order DP approaches that can do this are usable 
rules. The idea is that for a given set P of DPs, we only need to consider a subset 
UR(P,R) of R. Combined with the dependency graph processor, this makes it 
possible to split a large term rewriting system into a number of small problems. 

In the higher-order setting, simple versions of usable rules have also been 
defined [31,46]. We can easily extend these definitions to AFSMs: 


Theorem 54. Given a DP problem M = (P,R,m,f) with m = minimal and 
R finite, let UR(P,R) be the smallest subset of R such that: 


— if a symbol £ occurs in the right-hand side of an element of P or UR(P, R), 
and there is a rule f lı --- lk => r, then this rule is also in UR(P, R); 

— if there exists > re RorlSr (A) €P such thatr> F(s1,..., Sk) ti- tn 
with 81,...,8% not all distinct variables or with n > 0, then UR(P,R)=R. 


Then the processor that maps M to {(P, UR(P,R), arbitrary, all)} is sound. 


For the proof we refer to the very similar proofs in [31,46]. 


Example 55. For the set SDP(R) of the ordinal recursion example (Examples 8 
and 29), all rules are usable due to the occurrence of H M in the second DP. 
For the set SDP(R) of the map example (Examples 6 and 31), there are no 
usable rules, since the one DP contains no defined function symbols or applied 
meta-variables. 


This higher-order processor is much less powerful than its first-order version: 
if any DP or usable rule has a sub-meta-term of the form F s or F'(s1,..., Sk) 
with s,,...,8 not all distinct variables, then all rules are usable. Since applying 
a higher-order meta-variable to some argument is extremely common in higher- 
order rewriting, the technique is usually not applicable. Also, this processor 
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imposes a heavy price on the flags: minimality (at least) is required, but is lost; 
the formative flag is also lost. Thus, usable rules are often combined with reduc- 
tion triples to temporarily disregard rules, rather than as a way to permanently 
remove rules. 


To address these weaknesses, we consider a processor that uses similar ideas 
to usable rules, but operates from the left-hand sides of rules and DPs rather 
than the right. This adapts the technique from [31] that relies on the new for- 
mative flag. As in the first-order case [16], we use a semantic characterisation 
of formative rules. In practice, we then work with over-approximations of this 
characterisation, analogous to the use of dependency graph approximations in 
Theorem 45. 


Definition 56. A function FR that maps a pattern £ and a set of rules R to 
a set FR(,R) C R is a formative rules approximation if for all s and y: if 
s >h Ly by an é-formative reduction, then this reduction can be done using only 
rules in FR(£,R). 

We let FR(P,R) = U{FR(G,R) | £ --- ln S D(A) € PAL S<iK< nt}. 

Thus, a formative rules approximation is a subset of R that is sufficient for 
a formative reduction: if s > ly, then s > PR(LR) ty. It is allowed for there to 
exist other formative reductions that do use additional rules. 


Example 57. We define a simple formative rules approximation: (1) FR(Z,R) = 
0 if Z is a meta-variable; (2) FR(f ,---€m,R) = FR(G, R) U---U FR(Em, R) 
if f:0, > ... —> Om > 1 and no rules have type +; (3) FR(s, R) = R otherwise. 
This is a formative rules approximation: if s >} Zy by a Z-formative reduction, 
then s = Zy, and ifs > f &---€m and no rules have the same output type as 
s, then s = f s)--+-8,, and each s; >} liy (by an é;-formative reduction). 

The following result follows directly from the definition of formative rules. 


Theorem 58 (Formative rules processor). For a formative rules approai- 
mation FR, the processor Procrr that maps a DP problem (P, R, m, formative) 
to {(P, FR(P,R),m, formative)} is both sound and complete. 


Proof (sketch). A processor that only removes rules (or DPs) is always complete. 
For soundness, if the chain is formative then each step t; =>} $;+1 can be replaced 
by ti >=pr(p r) Si+1- Thus, the chain can be seen as a (P, FR(P, R))-chain. 


Example 59. For our ordinal recursion example (Examples 8 and 29), none 
of the rules are included when we use the approximation of Example 57 
since all rules have output type ord. Thus, Procrr maps (SDP(R),R, 
computable, formative) to (SDP(R),0, computable, formative). Note: this 
example can also be completed without formative rules (see Example 64). Here 
we illustrate that, even with a simple formative rules approximation, we can 
often delete all rules of a given type. 


Formative rules are introduced in [31], and the definitions can be adapted to a 
more powerful formative rules approximation than the one sketched in Example 
59. Several examples and deeper intuition for the first-order setting are given in 
[16]. 
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5.4 Subterm Criterion Processors 


Reduction triple processors are powerful, but they exert a computational price: 
we must orient all rules in R. The subterm criterion processor allows us to 
remove DPs without considering R at all. It is based on a projection function 
[24], whose higher-order counterpart [31,34,46] is the following: 


Definition 60. For P a set of DPs, let heads(P) be the set of all symbols £ that 
occur as the head of a left- or right-hand side of a DP in P. A projection function 
for P is a function v : heads(P) > N such that for all DPs € => p (A) € P, the 
function T with D(f s1---8n) = sy) is well-defined both for L and for p. 


Theorem 61 (Subterm criterion processor). The processor ProCgupcrit that 
maps a DP problem (Pı © P2,R,m, f) with m > minimal to {(P2, R, m, f)} if 
a projection function v exists such that D(L) > D(p) for all L p (A) € Pı and 
D(l) = V(p) for all L= p (A) € P2, is sound and complete. 


Proof (sketch). If the conditions are satisfied, every infinite (P, R)-chain induces 
an infinite >- >} sequence that starts in a strict subterm of tı, contradicting 
minimality unless all but finitely many steps are equality. Since every occurrence 
of a pair in Pı results in a strict > step, a tail of the chain lies in P2. 


Example 62. Using v(map*) = 2, Procsuscrit maps the DP problem ({(1)}, 
R, computableg, formative) from Example 47 to {(0, R, computablep, 
formative) }. 


The subterm criterion can be strengthened, following [34,46], to also handle 
DPs like the one in Example 28. Here, we focus on a new idea. For computable 
chains, we can build on the idea of the subterm criterion to get something more. 


Theorem 63 (Computable subterm criterion processor). The proces- 
sor Procstatcrit that maps a DP problem (P; © P2,R,computable,,, f) to 
{(P2, R, computable,,, f)} if a projection function v exists such that (£) 3 U(p) 
for all £ = p (A) € Pı and D(A = Vp) for all L > p (A) € Pa, is sound 
and complete. Here, I is the relation on base-type terms with s I t ifs Æt 
and (a) s Pace t or (b) a meta-variable Z exists with s Pace Z(£1,..., £k) and 
t= Z(ty,..., tr) S$, tt Sn. 


Proof (sketch). By the conditions, every infinite (P, R)-chain induces an infinite 
(Sa, U =B)": =>} sequence (where Cy is defined following Theorem 13). This 
contradicts computability unless there are only finitely many inequality steps. 
As pairs in Pı give rise to a strict decrease, they may occur only finitely often. 


Example 64. Following Examples 8 and 29, consider the projection function 
v with v(rect) = 1. As s X Dacc X and lim H Dace H, both s X 3 X 
and lim H 3 H M hold. Thus Procgtatc(P, R, computable, , formative) = 
{(0,R, computablez,formative)}. By the dependency graph processor, the 
AFSM is terminating. 


The computable subterm criterion processor fundamentally relies on the new 
computable,, flag, so it has no counterpart in the literature so far. 
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5.5 Non-termination 


While (most of) the processors presented so far are complete, none of them can 
actually return NO. We have not yet implemented such a processor; however, we 
can already provide a general specification of a non-termination processor. 


Theorem 65 (Non-termination processor). Let M = (P,R,m, f) be a DP 
problem. The processor that maps M to NO if it determines that a sufficient 
criterion for non-termination of =r or for existence of an infinite conservative 
(P, R)-chain according to the flags m and f holds is sound and complete. 


Proof. Obvious. 


This is a very general processor, which does not tell us how to determine 
such a sufficient criterion. However, it allows us to conclude non-termination as 
part of the framework by identifying a suitable infinite chain. 


Example 66. If we can find a finite (P, R)-chain [(po, 50, to),---; (Pn, Sn, tn)] 
with t, = soy for some substitution y which uses only conservative DPs, 
is formative if f = formative and is U-computable if m = computable,,, 
such a chain is clearly a sufficient criterion: there is an infinite chain 
[(po, So, to),---; (p0, S07, toy), ---; (Po, S077; toyy),---]. If m = minimal and we 
find such a chain that is however not minimal, then note that +R is non- 
terminating, which also suffices. 

For example, for a DP problem (P,R,minimal, all) with P = {f}? F X > 
gi (F X), g! X > fË h X}, there is a finite dependency chain: [(f* F X > 
g’ (F X), flna, g? (ha)), (gf X > fin X, gt (no), f? h (h x))]. As fË h (h x) 
is an instance of fë h x, the processor maps this DP problem to NO. 


To instantiate Theorem 65, we can borrow non-termination criteria from first- 
order rewriting [13,21,42], with minor adaptions to the typed setting. Of course, 
it is worthwhile to also investigate dedicated higher-order non-termination 
criteria. 


6 Conclusions and Future Work 


We have built on the static dependency pair approach [6,33,34,46] and formu- 
lated it in the language of the DP framework from first-order rewriting [20, 22]. 
Our formulation is based on AFSMs, a dedicated formalism designed to make 
termination proofs transferrable to various higher-order rewriting formalisms. 
This framework has two important additions over existing higher-order DP 
approaches in the literature. First, we consider not only arbitrary and minimally 
non-terminating dependency chains, but also minimally non-computable chains; 
this is tracked by the computable, flag. Using the flag, a dedicated processor 
allows us to efficiently handle rules like Example 8. This flag has no counterpart 
in the first-order setting. Second, we have generalised the idea of formative rules 
in [31] to a notion of formative chains, tracked by a formative flag. This makes 
it possible to define a corresponding processor that permanently removes rules. 
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Implementation and Experiments. To provide a strong formal groundwork, we 
have presented several processors in a general way, using semantic definitions of, 
e.g., the dependency graph approximation and formative rules rather than syn- 
tactic definitions using functions like TCap [21]. Even so, most parts of the DP 
framework for AFSMs have been implemented in the open-source termination 
prover WANDA [28], alongside a dynamic DP framework [31] and a mechanism 
to delegate some ordering constraints to a first-order tool [14]. For reduction 
triples, polynomial interpretations [15] and a version of HORPO [29, Ch. 5] are 
used. To solve the constraints arising in the search for these orderings, and also to 
determine sort orderings (for the accessibility relation) and projection functions 
(for the subterm criteria), WANDA employs an external SAT-solver. WANDA 
has won the higher-order category of the International Termination Competi- 
tion [50] four times. In the International Confluence Competition [10], the tools 
ACPH [40] and CSI*ho [38] use WANDA as their “oracle” for termination proofs 
on HRSs. 

We have tested WANDA on the Termination Problems Data Base [49], using 
AProVE [19] and MiniSat [12] as back-ends. When no additional features are 
enabled, WANDA proves termination of 124 (out of 198) benchmarks with static 
DPs, versus 92 with only a search for reduction orderings; a 34% increase. When 
all features except static DPs are enabled, WANDA succeeds on 153 benchmarks, 
versus 166 with also static DPs; an 8% increase, or alternatively, a 29% decrease 
in failure rate. The full evaluation is available in [17, Appendix D]. 


Future Work. While the static and the dynamic DP approaches each have their 
own strengths, there has thus far been little progress on a unified approach, 
which could take advantage of the syntactic benefits of both styles. We plan to 
combine the present work with the ideas of [31] into such a unified DP framework. 

In addition, we plan to extend the higher-order DP framework to rewriting 
with strategies, such as implicit -normalisation or strategies inspired by func- 
tional programming languages like OCaml and Haskell. Other natural directions 
are dedicated automation to detect non-termination, and reducing the number of 
term constraints solved by the reduction triple processor via a tighter integration 
with usable and formative rules with respect to argument filterings. 


References 


1. Aczel, P.: A general Church-Rosser theorem. Unpublished Manuscript, University 
of Manchester (1978) 

2. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. 
Theor. Comput. Sci. 236(1-2), 133-178 (2000). https://doi.org/10.1016/S0304- 
3975(99)00207-8 

3. Baader, F., Nipkow, F.: Term Rewriting and All That. Cambridge University Press, 
Cambridge (1998) 

4. Bachmair, L., Ganzinger, H.: Rewrite-based equational theorem proving with selec- 
tion and simplification. J. Logic Comput. 4(3), 217-247 (1994). https://doi.org/ 
10.1093/logcom/4.3.217 


10. 


12. 


13. 


14. 


15. 


16. 


TT 


18. 


19. 


20. 


A Static Higher-Order Dependency Pair Framework 779 


Blanqui, F.: Termination and confluence of higher-order rewrite systems. In: Bach- 
mair, L. (ed.) RTA 2000. LNCS, vol. 1833, pp. 47-61. Springer, Heidelberg (2000). 
https: //doi.org/10.1007/10721975_4 

Blanqui, F.: Higher-order dependency pairs. In: Proceedings of the WST 2006 
(2006) 

Blanqui, F.: Termination of rewrite relations on \-terms based on Girard’s notion 
of reducibility. Theor. Comput. Sci. 611, 50-86 (2016). https: //doi.org/10.1016/j. 
tcs.2015.07.045 

Blanqui, F., Jouannaud, J., Okada, M.: Inductive-data-type systems. Theor. 
Comput. Sci. 272(1-2), 41-68 (2002). https://doi.org/10.1016/S0304- 
3975(00)00347-9 

Blanqui, F., Jouannaud, J., Rubio, A.: The computability path ordering. Logical 
Methods Comput. Sci. 11(4) (2015). https: //doi-org/10.2168/LMCS-11(4:3)2015 
Community. The International Confluence Competition (CoCo) (2018). http:// 
project-coco.uibk.ac.at/ 


. Dershowitz, N., Kaplan, S.: Rewrite, rewrite, rewrite, rewrite, rewrite. In: Confer- 


ence Record of the Sixteenth Annual ACM Symposium on Principles of Program- 
ming Languages, Austin, Texas, USA, 11-13 January 1989, pp. 250-259. ACM 
Press (1989). https: //doi.org/10.1145/75277.75299 

Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, 
A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502-518. Springer, Heidelberg (2004). 
https: //doi.org/10.1007/978-3-540-24605-3_37 

Emmes, F., Enger, T., Giesl, J.: Proving non-looping non-termination automati- 
cally. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), 
vol. 7364, pp. 225-240. Springer, Heidelberg (2012). https: //doi.org/10.1007/978- 
3-642-31365-3_19 

Fuhs, C., Kop, C.: Harnessing first order termination provers using higher order 
dependency pairs. In: Tinelli, C., Sofronie-Stokkermans, V. (eds.) FroCoS 2011. 
LNCS (LNAI), vol. 6989, pp. 147-162. Springer, Heidelberg (2011). https://doi. 
org/10.1007/978-3-642-24364-6_11 

Fuhs, C., Kop, C.: Polynomial interpretations for higher-order rewriting. In: Tiwari, 
A. (ed.) 23rd International Conference on Rewriting Techniques and Applications 
(RTA 2012) , RTA 2012. LIPIcs, vol. 15, Nagoya, Japan, 28 May—2 June 2012. pp. 
176-192. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012). https://doi. 
org/10.4230/LIPIcs.RTA.2012.176 

Fuhs, C., Kop, C.: First-order formative rules. In: Dowek, G. (ed.) RTA 2014. 
LNCS, vol. 8560, pp. 240-256. Springer, Cham (2014). https://doi.org/10.1007/ 
978-3-319-08918-8_17 

Fuhs, C., Kop, C.: A static higher-order dependency pair framework (extended 
version). Technical report arXiv:1902.06733 [cs.LO], CoRR (2019) 

Fuhs, C., Kop, C., Nishida, N.: Verifying procedural programs via constrained 
rewriting induction. ACM Trans. Comput. Logic 18(2), 14:1—14:50 (2017). https:// 
doi.org/10.1145/3060143 

Giesl, J., et al.: Analyzing program termination and complexity automatically with 
AProVE. J. Autom. Reasoning 58(1), 3-31 (2017). https: //doi.org/10.1007/s10817- 
016-9388-y 

Giesl, J., Thiemann, R., Schneider-Kamp, P.: The dependency pair framework: 
combining techniques for automated termination proofs. In: Baader, F., Voronkov, 
A. (eds.) LPAR 2005. LNCS (LNAI), vol. 3452, pp. 301-331. Springer, Heidelberg 
(2005). https: //doi.org/10.1007/978-3-540-32275-7_21 


780 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


3l. 


32. 


33. 


34. 


35. 


36. 


C. Fuhs and C. Kop 


Giesl, J., Thiemann, R., Schneider-Kamp, P.: Proving and disproving termina- 
tion of higher-order functions. In: Gramlich, B. (ed.) FroCoS 2005. LNCS (LNAI), 
vol. 3717, pp. 216-231. Springer, Heidelberg (2005). https://doi.org/10.1007/ 
11559306_12 

Giesl, J., Thiemann, R., Schneider-Kamp, P., Falke, S.: Mechanizing and improving 
dependency pairs. J. Autom. Reasoning 37(3), 155-203 (2006). https://doi.org/ 
10.1007 /s10817-006-9057-7 

Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In: 
Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 
103-117. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12251- 
49 

Hirokawa, N., Middeldorp, A.: Tyrolean termination tool: techniques and features. 
Inf. Comput. 205(4), 474-511 (2007). https://doi.org/10.1016/j.ic.2006.08.010 
Hoe, J.C., Arvind: Hardware synthesis from term rewriting systems. In: Silveira, 
L.M., Devadas, S., Reis, R. (eds.) VLSI: Systems on a Chip. IFIPAICT, vol. 34, 
pp. 595-619. Springer, Boston (2000). https://doi.org/10.1007/978-0-387-35498- 
9_52 

Jouannaud, J., Rubio, A.: The higher-order recursive path ordering. In: 14th 
Annual IEEE Symposium on Logic in Computer Science, Trento, Italy, 2-5 July 
1999, pp. 402-411. IEEE Computer Society (1999). https: //doi.org/10.1109/LICS. 
1999.782635 

Klop, J., Oostrom, V.V., Raamsdonk, F.V.: Combinatory reduction systems: intro- 
duction and survey. Theor. Comput. Sci. 121(1-2), 279-308 (1993). https://doi. 
org/10.1016/0304-3975(93)90091-7 

Kop, C.: WANDA - a higher-order termination tool. http://wandahot.sourceforge. 
net / 

Kop, C.: Higher order termination. Ph.D. thesis, VU Amsterdam (2012) 

Kop, C., van Raamsdonk, F.: Higher order dependency pairs for algebraic func- 
tional systems. In: Schmidt-SchauB, M. (ed.) Proceedings of the 22nd International 
Conference on Rewriting Techniques and Applications, RTA 2011. LIPIcs, vol. 10, 
Novi Sad, Serbia, 30 May—1 June 2011, pp. 203-218. Schloss Dagstuhl - Leibniz- 
Zentrum fuer Informatik (2011). https://doi.org/10.4230/LIPIcs.RTA.2011.203 
Kop, C., van Raamsdonk, F.: Dynamic dependency pairs for algebraic functional 
systems. Logical Methods Comput. Sci. 8(2), 10:1-10:51 (2012). https://doi.org/ 
10.2168/LMCS-8(2:10)2012 

Kusakari, K.: Static dependency pair method in rewriting systems for functional 
programs with product, algebraic data, and ML-polymorphic types. IEICE Trans. 
96-D(3), 472—480 (2013). https: //doi.org/10.1587/transinf.E96.D.472 

Kusakari, K.: Static dependency pair method in functional programs. IEICE 
Trans. Inf. Syst. E101.D(6), 1491-1502 (2018). https://doi.org/10.1587/transinf. 
2017FOP0004 

Kusakari, K., Isogai, Y., Sakai, M., Blanqui, F.: Static dependency pair method 
based on strong computability for higher-order rewrite systems. IEICE Trans. Inf. 
Syst. 92(10), 2007-2015 (2009). https: //doi.org/10.1587/transinf.E92.D.2007 
Kusakari, K., Nakamura, M., Toyama, Y.: Argument filtering transformation. In: 
Nadathur, G. (ed.) PPDP 1999. LNCS, vol. 1702, pp. 47-61. Springer, Heidelberg 
(1999). https: //doi.org/10.1007/10704567_3 

Meadows, C.A.: Applying formal methods to the analysis of a key management 
protocol. J. Comput. Secur. 1(1), 5-36 (1992). https://doi.org/10.3233/JCS-1992- 
1102 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


A Static Higher-Order Dependency Pair Framework 781 


Miller, D.: A logic programming language with lambda-abstraction, function vari- 
ables, and simple unification. J. Logic Comput. 1(4), 497-536 (1991). https://doi. 
org/10.1093/logcom/1.4.497 

Nagele, J.: CoCo 2018 participant: CSI^ho 0.2 (2018). http://project-coco.uibk. 
ac.at /2018/papers/csiho.pdf 

Nipkow, T.: Higher-order critical pairs. In: Proceedings of the Sixth Annual Sym- 
posium on Logic in Computer Science (LICS 1991), Amsterdam, The Netherlands, 
15-18 July 1991, pp. 342-349. IEEE Computer Society (1991). https: //doi.org/10. 
1109/LICS.1991.151658 

Onozawa, K., Kikuchi, K., Aoto, T., Toyama, Y.: ACPH: system description for 
CoCo 2017 (2017). http://project-coco.uibk.ac.at/2017/papers/acph.pdf 

Otto, C., Brockschmidt, M., von Essen, C., Giesl, J.: Automated termination anal- 
ysis of Java Bytecode by term rewriting. In: Lynch, C. (ed.) Proceedings of the 
21st International Conference on Rewriting Techniques and Applications, RTA 
2010. LIPIcs, vol. 6, Edinburgh, Scottland, UK, 11-13 July 2010, pp. 259-276. 
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2010). https://doi.org/10. 
4230/LIPIcs.RTA.2010.259 

Payet, É.: Loop detection in term rewriting using the eliminating unfoldings. Theor. 
Comput. Sci. 403(2-3), 307-327 (2008). https://doi.org/10.1016/j.tcs.2008.05.013 
van de Pol, J.: Termination of higher-order rewrite systems. Ph.D. thesis, Univer- 
sity of Utrecht (1996) 

Sakai, M., Kusakari, K.: On dependency pair method for proving termination of 
higher-order rewrite systems. IEICE Trans. Inf. Syst. E88-D(3), 583-593 (2005) 
Sakai, M., Watanabe, Y., Sakabe, T.: An extension of the dependency pair method 
for proving termination of higher-order rewrite systems. IEICE Trans. Inf. Syst. 
E84-D(8), 1025-1032 (2001) 

Suzuki, S., Kusakari, K., Blanqui, F.: Argument filterings and usable rules in 
higher-order rewrite systems. IPSJ Trans. Program. 4(2), 1-12 (2011) 

Tait, W.: Intensional interpretation of functionals of finite type. J. Symbolic Logic 
32(2), 187-199 (1967) 

Terese: Term Rewriting Systems. Cambridge Tracts in Theoretical Computer Sci- 
ence, vol. 55. Cambridge University Press, Cambridge (2003) 

Wiki: Termination Problems DataBase (TPDB). http://termination-portal.org/ 
wiki/TPDB 

Wiki: The International Termination Competition (TermComp) (2018). http:// 
termination-portal.org/wiki/Termination_Competition 


782 C. Fuhs and C. Kop 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


Check for 
updates 


Coinduction in Uniform: Foundations 
for Corecursive Proof Search 
with Horn Clauses 


Henning Basold!®), Ekaterina Komendantskaya?“™), and Yue Li? 


1 CNRS, ENS Lyon, Lyon, France 
henning.basold@ens-lyon.fr 
2 Heriot-Watt University, Edinburgh, UK 
{ek19,y155}@hw.ac.uk 


Abstract. We establish proof-theoretic, constructive and coalgebraic 
foundations for proof search in coinductive Horn clause theories. Opera- 
tional semantics of coinductive Horn clause resolution is cast in terms of 
coinductive uniform proofs; its constructive content is exposed via sound- 
ness relative to an intuitionistic first-order logic with recursion controlled 
by the later modality; and soundness of both proof systems is proven rel- 
ative to a novel coalgebraic description of complete Herbrand models. 


Keywords: Horn clause logic - Coinduction - Uniform proofs - 
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1 Introduction 


Horn clause logic is a Turing complete and constructive fragment of first-order 
logic, that plays a central role in verification [22], automated theorem proving [52, 
53,57] and type inference. Examples of the latter can be traced from the Hindley- 
Milner type inference algorithm [55,73], to more recent uses of Horn clauses in 
Haskell type classes [26,51] and in refinement types [28,43]. Its popularity can 
be attributed to well-understood fixed point semantics and an efficient semi- 
decidable resolution procedure for automated proof search. 

According to the standard fixed point semantics [34,52], given a set P of 
Horn clauses, the least Herbrand model for P is the set of all (finite) ground 
atomic formulae inductively entailed by P. For example, the two clauses below 
define the set of natural numbers in the least Herbrand model. 


Knato : nat 0 


Knats : Vz. nat x — nat (s x) 
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Formally, the least Herbrand model for the above two clauses is the set of ground 
atomic formulae obtained by taking a (forward) closure of the above two clauses. 
The model for nat is given by M = {nat 0, nat (s0), nat (s(s0)),...}. 

We can also view Horn clauses coinductively. The greatest complete Herbrand 
model for a set P of Horn clauses is the largest set of finite and infinite ground 
atomic formulae coinductively entailed by P. For example, the greatest complete 
Herbrand model for the above two clauses is the set 


N® =N U {nat (s(s(---)))}, 


obtained by taking a backward closure of the above two inference rules on the set 
of all finite and infinite ground atomic formulae. The greatest Herbrand model is 
the largest set of finite ground atomic formulae coinductively entailed by P. In 
our example, it would be given by M already. Finally, one can also consider the 
least complete Hebrand model, which interprets entailment inductively but over 
potentially infinite terms. In the case of nat, this interpretation does not differ 
from M. However, finite paths in coinductive structures like transition systems, 
for example, require such semantics. 

The need for coinductive semantics of Horn clauses arises in several scenarios: 
the Horn clause theory may explicitly define a coinductive data structure or a 
coinductive relation. However, it may also happen that a Horn clause theory, 
which is not explicitly intended as coinductive, nevertheless gives rise to infinite 
inference by resolution and has an interesting coinductive model. This commonly 
happens in type inference. We will illustrate all these cases by means of examples. 


Horn Clause Theories as Coinductive Data Type Declarations. The following 
clause defines, together with Knato and Knats, the type of streams over natural 
numbers. 


Kstream : Vzy. nat x A streamy — stream (scons z y) 


This Horn clause does not have a meaningful inductive, i.e. least fixed point, 
model. The greatest Herbrand model of the clauses is given by 


S = N® U {stream(scons zo (scons 71 ---)) | nat zo, nat z1,... E N} 


In trying to prove, for example, the goal (stream x), a goal-directed proof 
search may try to find a substitution for x that will make (stream x) valid 
relative to the coinductive model of this set of clauses. This search by resolu- 


; ; ; i Kstream:[scons y 2’ /] 
tion may proceed by means of an infinite reduction stream g 7°" ~s 


Knato:[0/y] Kstream:[scons y’ 2” /x’] 
Ss ~~ 


nat y A stream x’ stream 2’ --+, thereby gen- 
erating a stream Z of zeros via composition of the computed substitutions: 
Z = (scons02’)[scons02”/az"]---. Above, we annotated each resolution step 
with the label of the clause it resolves against and the computed substitution. A 
method to compute an answer for this infinite sequence of reductions was given 
by Gupta et al. [41] and Simon et al. [69]: the underlined loop gives rise to the 
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circular unifier z = scons 0 x that corresponds to the infinite term Z. It is proven 
that, if a loop and a corresponding circular unifier are detected, they provide an 
answer that is sound relative to the greatest complete Herbrand model of the 
clauses. This approach is known under the name of CoLP. 


Horn Clause Theories in Type Inference. Below clauses give the typing rules of 
the simply typed -calculus, and may be used for type inference or type checking: 


Ky: Ve T a. varz A find’ xa — typedI'xa 
kt2 : Vx l amb. typed |z : al] mb > typed I (Axm) (a — b) 
k3 : VI amnb. typed I'm (a —> b) A typed na — typed T (appmn)b 


It is well known that the Y-combinator is not typable in the simply-typed 
Aà-calculus and, in particular, self-application Ax. xx is not typable either. How- 
ever, by switching off the occurs-check in Prolog or by allowing circular unifiers 
in CoLP [41,69], we can resolve the goal “typed |] (A x (app z )) a” and would 
compute the circular substitution: a = b — c,b = b > c suggesting that an 
infinite, or circular, type may be able to type this A-term. A similar trick would 
provide a typing for the Y-combinator. Thus, a coinductive interpretation of the 
above Horn clauses yields a theory of infinite types, while an inductive interpre- 
tation corresponds to the standard type system of the simply typed A-calculus. 


Horn Clause Theories in Type Class Inference. Haskell type class inference does 
not require circular unifiers but may require a cyclic resolution inference [37,51]. 
Consider, for example, the following mutually defined data structures in Haskell. 


data OddList a = OCons a (EvenList a) 
data EvenList a = Nil | ECons a (OddList a) 


This type declaration gives rise to the following equality class instance declara- 
tions, where we leave the, here irrelevant, body out. 


instance(Eq a, Eq (EvenList a)) => Eq (OddList a) where 
instance(Eq a, Eq (OddList a)) => Eq (EvenList a) where 


The above two type class instance declarations have the shape of Horn clauses. 
Since the two declarations mutually refer to each other, an instance inference 
for, e.g., Eq (OddList Int) will give rise to an infinite resolution that alternates 
between the subgoals Eq (OddList Int) and Eq (EvenList Int). The solution 
is to terminate the computation as soon as the cycle is detected [51], and this 
method has been shown sound relative to the greatest Herbrand models in [36]. 
We will demonstrate this later in the proof systems proposed in this paper. 

The diversity of these coinductive examples in the existing literature shows 
that there is a practical demand for coinductive methods in Horn clause logic, 
but it also shows that no unifying proof-theoretic approach exists to allow for a 
generic use of these methods. This causes several problems. 

Problem 1. The existing proof-theoretic coinductive interpretations 
of cycle and loop detection are unclear, incomplete and not uniform. 
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Table 1. Examples of greatest (complete) Herbrand models for Horn clauses 
Y1; Y2; Y3. The signatures are {a} for the clause yı and {a, f} for the others. 


Horn clauses y :Va.px>pz| y:Va.p(fxr)>px i493: Vz.px > p(f x) 
Greatest Herbrand {pa} {p(a), p(f a), plf (f a)), 0 

model: TER 

Greatest complete] {pa} [PO PTPU] POT 
Herbrand model: pP.) 

CoLP substitution id fails fails 

for query pa 

CoLP substitution id T=fz c= fax 

for query px 


To see this, consider Table 1, which exemplifies three kinds of circular phenom- 
ena in Horn clauses: The clause 7; is the easiest case. Its coinductive models 
are given by the finite set {pa}. On the other extreme is the clause y3 that, 
just like Kstream, admits only an infinite formula in its coinductive model. The 
intermediate case is 72, which could be interpreted by an infinite set of finite 
formulae in its greatest Herbrand model, or may admit an infinite formula in 
its greatest complete Herbrand model. Examples like 7, appear in Haskell type 
class resolution [51], and examples like y2 in its experimental extensions [37]. 
Cycle detection would only cover computations for y1, whereas y2, y3 require 
some form of loop detection!. However, CoLP’s loop detection gives confusing 
results here. It correctly fails to infer pa from 73 (no unifier for subgoals pa and 
p(f a) exists), but incorrectly fails to infer pa from 72 (also failing to unify pa 
and p (f a)). The latter failure is misleading bearing in mind that pa is in fact in 
the coinductive model of y2. Vice versa, if we interpret the CoLP answer x = f x 
as a declaration of an infinite term (f f ...) in the model, then CoLP’s answer 
for y3 and pz is exactly correct, however the same answer is badly incomplete for 
the query involving px and y2, because y2 in fact admits other, finite, formulae 
in its models. And in some applications, e.g. in Haskell type class inference, a 
finite formula would be the only acceptable answer for any query to 7. 

This set of examples shows that loop detection is too coarse a tool to give 
an operational semantics to a diversity of coinductive models. 

Problem 2. Constructive interpretation of coinductive proofs in 
Horn clause logic is unclear. Horn clause logic is known to be a construc- 
tive fragment of FOL. Some applications of Horn clauses rely on this property 
in a crucial way. For example, inference in Haskell type class resolution is con- 
structive: when a certain formula F is inferred, the Haskell compiler in fact 
constructs a proof term that inhabits F seen as type. In our earlier example 
Eq (OddList Int) of the Haskell type classes, Haskell in fact captures the cycle 
by a fixpoint term t and proves that t inhabits the type Eq (OddList Int). 


1 We follow the standard terminology of [74] and say that two formulae F and G form 
a cycle if F = G, and a loop if F[0] = G[6] for some (possibly circular) unifier 6. 
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co-hohcgx. ——-_ co-hohhgx 


at a 


co-hohe —————————_» co-hohh 


| 
| co-fohCgy — co-fohhg,. 
D Po 


co-fohe —————————> co-fohh 


Fig. 1. Cube of logics covered by CUP 


Although we know from [36] that these computations are sound relative to great- 
est Herbrand models of Horn clauses, the results of [36] do not extend to Horn 
clauses like y3 Or Kstream, or generally to Horn clauses modelled by the greatest 
complete Herbrand models. This shows that there is not just a need for coinduc- 
tive proofs in Horn clause logic, but constructive coinductive proofs. 

Problem 3. Incompleteness of circular unification for irregular coin- 
ductive data structures. Table 1 already showed some issues with incomplete- 
ness of circular unification. A more famous consequence of it is the failure of cir- 
cular unification to capture irregular terms. This is illustrated by the following 
Horn clause, which defines the infinite stream of successive natural numbers. 


Kfrom : Vx y. from (sx) y > from z (scons x y) 
The reductions for from0 y consist only of irregular (non-unifiable) formulae: 


1 ®from:[scons (s0) y”/y'] 


from (s0) y pen, jan 


Kfrom:[scons 0 y’/y] 
~~ 


from 0 y 


The composition of the computed substitutions would suggest an infinite term 
as answer: from 0 (scons 0 (scons (s0) ...)). However, circular unification no 
longer helps to compute this answer, and CoLP fails. Thus, there is a need for 
more general operational semantics that allows irregular coinductive structures. 


A New Theory of Coinductive Proof Search in Horn Clause Logic 


In this paper, we aim to give a principled and general theory that resolves 
the three problems above. This theory establishes a constructive foundation for 
coinductive resolution and allows us to give proof-theoretic characterisations of 
the approaches that have been proposed throughout the literature. 

To solve Problem 1, we follow the footsteps of the uniform proofs by Miller 
et al. [53,54], who gave a general proof-theoretic account of resolution in first- 
order Horn clause logic (fohc) and three extensions: first-order hereditary Har- 
rop clauses (fohh), higher-order Horn clauses (hohc), and higher-order heredi- 
tary Harrop clauses (hohh). In Sect. 3, we extend uniform proofs with a general 
coinduction proof principle. The resulting framework is called coinductive uni- 
form proofs (CUP). We show how the coinductive extensions of the four logics of 
Miller et al., which we name co-fohc, co-fohh, co-hohc and co-hohh, give a precise 
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proof-theoretic characterisation to the different kinds of coinduction described 
in the literature. For example, coinductive proofs involving the clauses yı and 
Yq belong to co-fohc and co-fohh, respectively. However, proofs involving clauses 
like y3 Or Kstream require in addition fixed point terms to express infinite data. 
These extentions are denoted by co-fohcg,., co-fohhg,., co-hohcg, and co-hohhgx. 

Section 3 shows that this yields the cube in Fig. 1, where the arrows show the 
increase in logical strength. The invariant search for regular infinite objects done 
in CoLP is fully described by the logic co-fohcg,,., including proofs for clauses like 
73 and Kstream- An important consequence is that CUP is complete for 71, 7, 
and 73, e.g. pa is provable from yz in CUP, but not in CoLP. 

In tackling Problem 3, we will find that the irregular proofs, such as those 
for Kfrom, can be given in co-hohhg,. The stream of successive numbers can be 
defined as a higher-order fixed point term sp = fix f. Av. scons x (f (s x)), and 
the proposition Vx. from x (sf x) is provable in co-hohhg,. This requires the use 
of higher-order syntax, fixed point terms and the goals of universal shape, which 
become available in the syntax of Hereditary Harrop logic. 

In order to solve Problem 2 and to expose the constructive nature of the 
resulting proof systems, we present in Sect.4 a coinductive extension of first- 
order intuitionistic logic and its sequent calculus. This extension (iFOL,) is 
based on the so-called later modality (or Löb modality) known from provability 
logic [16,71], type theory [8,58] and domain theory [20]. However, our way of 
using the later modality to control recursion in first-order proofs is new and 
builds on [13,14]. In the same section we also show that CUP is sound relative 
to iFOL,, which gives us a handle on the constructive content of CUP. This 
yields, among other consequences, a constructive interpretation of CoLP proofs. 

Section 5 is dedicated to showing soundness of both coinductive proof systems 
relative to complete Herbrand models [52]. The construction of these models is 
carried out by using coalgebras and category theory. This frees us from having to 
use topological methods and will simplify future extensions of the theory to, e.g., 
encompass typed logic programming. It also makes it possible to give original 
and constructive proofs of soundness for both CUP and iFOLy, in Sect.5. We 
finish the paper with discussion of related and future work. 


Originality of the Contribution 


The results of this paper give a comprehensive characterisation of coinductive 
Horn clause theories from the point of view of proof search (by expressing coin- 
ductive proof search and resolution as coinductive uniform proofs), constructive 
proof theory (via a translation into an intuitionistic sequent calculus), and coal- 
gebraic semantics (via coinductive Herbrand models and constructive soundness 
results). Several of the presented results have never appeared before: the coin- 
ductive extension of uniform proofs; characterisation of coinductive properties of 
Horn clause theories in higher-order logic with and without fixed point operators; 
coalgebraic and fibrational view on complete Herbrand models; and soundness of 
an intuitionistic logic with later modality relative to complete Herbrand models. 
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2 Preliminaries: Terms and Formulae 


In this section, we set up notation and terminology for the rest of the paper. 
Most of it is standard, and blends together the notation used in [53] and [11]. 


Definition 1. We define the sets T of types and P of proposition types by the 
following grammars, where z and o are the base type and base proposition type. 


Td0,T:=t|o 37 PS p:=o0lo>p, o€T 


We adapt the usual convention that — binds to the right. 


CITES tirer FE°M:o77T TEN: 
T Feir PR asst TFEMN:r 
eso F M:rT Dts T EM :T 
rH- ArL.M:0o—>rT rHfixrt.M:T 


Fig. 2. Well-formed terms 


(piT1 > >m >o EUlU FFR M:n > PE Mn: th 
TIF pM- Mn 


Tle lew € {A, V, >} I,xz:tlop I,x2:7T\lkp 
DIRT rieQy Ci Van: 7.9 TIFT 


Fig. 3. Well-formed formulae 


Definition 2. A term signature X is a set of pairs c : 7, where T € T, anda 
predicate signature is a set IT of pairs p : p with p € P. The elements in X and 
IT are called term symbols and predicate symbols, respectively. Given term and 
predicate signatures X and J, we refer to the pair (2’, IT) as signature. Let Var 
be a countable set of variables, the elements of which we denote by z,y,... We 
call a finite list I’ of pairs x : r of variables and types a context. The set As of 
(well-typed) terms over X is the collection of all M with + M : 7 for some 
context I’ and type T € T, where l'H M : 7 is defined inductively in Fig. 2. A 
term is called closed if | M : 7, otherwise it is called open. Finally, we let Ay 
denote the set of all terms M that do not involve fix. 


Definition 3. Let (X, I) be a signature. We say that yis a (first-order) formula 
in context I’, if I’ IF y is inductively derivable from the rules in Fig. 3. 
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Definition 4. The reduction relation —> on terms in Ay is given as the 
compatible closure (reduction under applications and binders) of 8- and fix- 
reduction: 


(Av. M)N — M [N/a] fixa. M — M |fix«. M/2] 


We denote the reflexive, transitive closure of — by —». Two terms M and 
N are called convertible, if M = N, where = is the equivalence closure of —. 
Conversion of terms extends to formulae in the obvious way: if Mẹ = Mj, for 
k=1,...,n, then pM,---M, =p Mi- M}. 


We will use in the following that the above calculus features subject reduction 
and confluence, cf. [61]: if T- M:7 and M = N, then 2+ N:7; and M=WN 
iff there is a term P, such that M —» P and N —» P. 

The order of a type T € T is given as usual by ord(v) = 0 and ord(o > T) = 
max{ord(c) + 1, ord(r)}. If ord(r) < 1, then the arity of T is given by ar(z) = 0 
and ar(s > T) = ar(r) +1. A signature X is called first-order, if for all f : r € X 
we have ord(r) < 1. We let the arity of f then be ar(r) and denote it by ar(f). 


Definition 5. The set of guarded base terms over a first-order signature X is 
given by the following type-driven rules. 
“e:7TéeL ord(r) < 1 firex rF M:o—>rT rF N:o 
Eg er LFT rF- MN:T 
fives ord(r)<1 T,£:T,Y1:4,..-,Yar(r): tF Fg Mi: Loca) 
TE fixe. AJ. f M:r 


General guarded terms are terms M , such that all fix-subterms are guarded base 
terms, which means that they are generated by the following grammar. 


G::= M (with +, M : T for some type T) |c € X|x € Var |G G |Ax.G 


Finally, M is a first-order term over X with + M : 7 if ord(T) < 1 and the 
types of all variables occurring in T” are of order 0. We denote the set of guarded 
first-order terms M with + M :ı by Arr ) and the set of guarded terms in 
T by AG(L). If I is empty, we just write AZ and AG, respectively. 


Note that an important aspect of guarded terms is that no free variable 
occurs under a fix-operator. Guarded base terms should be seen as specific fixed 
point terms that we will be able to unfold into potentially infinite trees. Guarded 
terms close guarded base terms under operations of the simply typed A-calculus. 


Example 6. Let us provide a few examples that illustrate (first-order) guarded 
terms. We use the first-order signature X = {scons: t > t > 1,8: 6 > 1,0: 4}. 


1. Let sp = fix f.Aw.scons x (f (s x)) be the function that computes the 
streams of numerals starting at the given argument. It is easy to show that 
Fg Sfr : L — and so Sfr 0 € AS 
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2. For the same signature X we also have x : ub, x: 4. Thus x € AG (x L) 
and s x € AG (a oL). 
3. We have 7:4 — Fg x0: 24, but (x 0) g AÇ (a: 1> 0). 


The purpose of guarded terms is that these are productive, that is, we can 
reduce them to a term that either has a function symbol at the root or is just 
a variable. In other words, guarded terms have head normal forms: We say that 
aterm M is in head normal form, if M = f N for some fE X' or if M=-a2 
for some variable x. The following lemma is a technical result that is needed to 
show in Lemma 8 that all guarded terms have a head normal form. 


Lemma 7. Let M and N be guarded base terms with T,x : o Fg M : T and 
rE, N:o. Then M[N/z] is a guarded base term with Dg M[N/ax]: 7. 


Lemma 8. If M is a first-order guarded term with M € AST, then M 
reduces to a unique head normal form. This means that either (i) there is a 
unique f E X and terms Nj,...,Nar(p) with I Fg Ne: and M —» f N, and 


for all L f M— fL, then N =L; or (ii) M —> z for somex:teET. 


We end this section by introducing the notion of an atom and refinements 
thereof. This will enable us to define the different logics and thereby to analyse 
the strength of coinduction hypotheses, which we promised in the introduction. 


Definition 9. A formula y of the shape T or p M,--- Mn is an atom and a 


— first-order atom, if p and all the terms M; are first-order; 
— guarded atom, if all terms M; are guarded; and 
— simple atom, if all terms M; are non-recursive, that is, are in A5. 


First-order, guarded and simple atoms are denoted by Atı, At, and Atf. We 
denote conjunctions of these predicates by At? = Atı NAt’, and At} = Atı N At. 


Note that the restriction for At%, only applies to fixed point terms. Hence, any 
formula that contains terms without fix is already in Atf, and Atf, N At, = Até. 
Since these notions are rather subtle, we give a few examples 


Example 10. We list three examples of first-order atoms. 


1. For x: 4 we have stream x € Atı, but there are also “garbage” formulae like 
“stream (fix x.x)” in Atı. Examples of atoms that are not first-order are 
p M, where p: (t >t) Soorav:t—>ur- M:r. 

2. Our running example “from 0 (sp 0)” is a first-order guarded atom in At. 

3. The formulae in At} may not contain recursion and higher-order features. 
However, the atoms of Horn clauses in a logic program fit in here. 
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3 Coinductive Uniform Proofs 


This section introduces the eight logics of the coinductive uniform proof frame- 
work announced and motivated in the introduction. The major difference of 
uniform proofs with, say, a sequent calculus is the “uniformity” property, which 
means that the choice of the application of each proof rule is deterministic and 
all proofs are in normal form (cut free). This subsumes the operational semantics 
of resolution, in which the proof search is always goal directed. Hence, the main 
challenge, that we set out to solve in this section, is to extend the uniform proof 
framework with coinduction, while preserving this valuable operational property. 

We begin by introducing the different goal formulae and definite clauses that 
determine the logics that were presented in the cube for coinductive uniform 
proofs in the introduction. These clauses and formulae correspond directly to 
those of the original work on uniform proofs [53] with the only difference being 
that we need to distinguish atoms with and without fixed point terms. The 
general idea is that goal formulae (G-formulae) occur on the right of a sequent, 
thus are the goal to be proved. Definite clauses (D-formulae), on the other hand, 
are selected from the context as assumptions. This will become clear once we 
introduce the proof system for coinductive uniform proofs. 


Definition 11. Let D; be generated by the following grammar with i € {1,w}. 


D; = At; |G—>D|DAD|V«:7.D 


Table 2. D- and G-formulae for coinductive uniform proofs. 


Definite Clauses|Goals 
co-fohe |D: G:= At? |GAG|GVG|i2r:7.G 
co-hohc|D., G:= At} |GAG|GVG|ar:7.G 
co-fohh |D: G := Ati |GAG|GVG|4ar:7.G|D>4G|Vzr:7.G 
co-hohh| Dw G := At} |GAG|GVG|ar:7.G|D>G|Vzr:7.G 


The sets of definite clauses (D-formulae) and goals (G-formulae) of the four 
logics co-fohc, co-fohh, co-hohc, co-hohh are the well-formed formulae of the 
corresponding shapes defined in Table 2. For the variations co-fohhg,. etc. of these 
logics with fixed point terms, we replace upper index “s” with “g” everywhere in 
Table 2. A D-formula of the shape YX. A1 A- -+A An — Apo is called H-formula or 
Horn clause if A, € At}, and H9-formula if Ap € At{. Finally, a logic program 
(or program) P is a set of H-formulae. Note that any set of D-formulae in fohc 
can be transformed into an intuitionistically equivalent set of H-formulae [53]. 
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We are now ready to introduce the coinductive uniform proofs. Such proofs 
are composed of two parts: an outer coinduction that has to be at the root of 
a proof tree, and the usual the usual uniform proofs by Miller et al. [54]. The 
latter are restated in Fig. 4. Of special notice is the rule DECIDE that mimics the 
operational behaviour of resolution in logic programming, by choosing a clause 
D from the given program to resolve against. The coinduction is started by 
the rule CO-FIX in Fig.5. Our proof system mimics the typical recursion with a 
guard condition found in coinductive programs and proofs [5,8,19,31,40]. This 
guardedness condition is formalised by applying the guarding modality (_) on 
the formula being proven by coinduction and the proof rules that allow us to 
distribute the guard over certain logical connectives, see Fig.5. The guarding 
modality may be discharged only if the guarded goal was resolved against a clause 
in the initial program or any hypothesis, except for the coinduction hypotheses. 
This is reflected in the rule DECIDE(), where we may only pick a clause from P, 
and is in contrast to the rule DECIDE, in which we can pick any hypothesis. The 
proof may only terminate with the INITIAL step if the goal is no longer guarded. 

Note that the CO-FIX rule introduces a goal as a new hypothesis. Hence, 
we have to require that this goal is also a definite clause. Since coinduction 
hypotheses play such an important role, they deserve a separate definition. 


Definition 12. Given a language L from Table2, a formula y is a 
coinduction goal of L if p simultaneously is a D- and a G-formula of L. 


Note that the coinduction goals of co-fohc and co-fohh can be transformed 
into equivalent H- or H9-formulae, since any coinduction goal is a D-formula. 
Let us now formally introduce the coinductive uniform proof system. 


DPA BADEPUA ae ASA eras TR 
X; P; A => A X; PASA LP A= T 
X: P;ABA ZPA >G , _ZPBDA=>CG p 

G>D í X; P; A => D >G co 


PASSA 


5;P;A23 A z € {1,2} 5: P; A = Gi 5: P; A = G2 
AL 
5: P; A Bands A X; P; A => G1 A G2 


5; P; A ZS A OFNT a c:7,5;P;A=>G[c/t] c:7r€éz 


VR 
5: P; A %8 A X; P; A => Yr: T.G 
X; P; A => G[N/z] ØF N:rT : X; P; A= Gr x € {1,2} 
DP; AS rG ý X; P;,A => GV G wa 


Fig. 4. Uniform proof rules 
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Ai Pip = P) Pe 
LPH oe 
E;P;A3A DeP c:T7, X; P; A => (ple/t]) e:t Ex 
DECIDE() VR() 
X; P; A => (A) D: P; A => (Vx: 7.) 
X; P; A => (p1) X; P; A => (p2) AR() X; P; A, pı = (p2) R) 
X; P; A = (p1 A p2) X; P; A = (p1 > p2) 


Fig. 5. Coinductive uniform proof rules 


Definition 13. Let P and A be finite sets of, respectively, definite clauses and 
coinduction goals, over the signature X, and suppose that G is a goal and ọ 
is a coinduction goal. A sequent is either a uniform provability sequent of the 
form X; P; A => G or X; P; A 2 A as defined in Fig. 4, or it is a coinductive 
uniform provability sequent of the form X; P % vy as defined in Fig.5. Let L be 
a language from Table 2. We say that y is coinductively provable in L, if Pisa 
set of D-formulae in L, ọ is a coinduction goal in L and X; P % yw holds. 


The logics we have introduced impose different syntactic restrictions on D- 
and G-formulae, and will therefore admit coinduction goals of different strength. 
This ability to explicitly use stronger coinduction hypotheses within a goal- 
directed search was missing in CoLP, for example. And it allows us to account for 
different coinductive properties of Horn clauses as described in the introduction. 
We finish this section by illustrating this strengthening. 

The first example is one for the logic co-fohc, in which we illustrate the 
framework on the problem of type class resolution. 


Example 14. Let us restate the Haskell type class inference problem discussed 
in the introduction in terms of Horn clauses: 
Ki :eqi 
Koaa : Vz. eq x A eq (even x) > eq (odd x) 


Keven : Vz. eq x ^ eq (odd xz) — eq (even z) 


To prove eq (odd i) for this set of Horn clauses, it is sufficient to use this 
formula directly as coinduction hypothesis, as shown in Fig. 6. Note that this 
formula is indeed a coinduction goal of co-fohc, hence we find ourselves in the 
simplest scenario of coinductive proof search. In Table 1, yı is a representative 
for this kind of coinductive proofs with simplest atomic goals. 

It was pointed out in [37] that Haskell’s type class inference can also give rise 
to irregular corecursion. Such cases may require the more general coinduction 
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INITIAL 
X; P; É eq (odd i) 


DECIDE 


VL 


DECIDE 


X; P; p = eq (even i) 
X; P; => eq (even i) 


a 
= INITIAL 
X; P; => i 
eat ee : DECIDE 
re: INITIAL 2; P;p > eq i AR 
X; P; y 4 paai eq (odd i) X; P; p => eq i ^ eq (even i) z 
o 
5.: P. p eq i^ eq (even i)>eq (odd i) eq (odd i) 
X; P; 248 eq (odd i) 
DECIDE() 


X; P; p => (eq (odd i)) 
X; P + eq (odd i) 


CO-FIX 


Fig. 6. The co-fohc proof for Horn clauses arising from Haskell Type class examples. 
y abbreviates the coinduction hypothesis eq (odd i). Note its use in the branch @. 


hypothesis (e.g. universal and/or implicative) of co-fohh or co-hohh. The below 
set of Horn clauses is a simplified representation of a problem given in [37]: 


Ki: eqi 
Ks : Vz. (eq x) A eq (s (g x)) > eq (s x) 
Kg : Vz.eq £ —> eq (g x) 


Trying to prove eq (s i) by using eq (s i) directly as a coinduction hypothesis 
is deemed to fail, as the coinductive proof search is irregular and this coinduction 
hypothesis would not be applicable in any guarded context. But it is possible 
to prove eq (s i) as a corollary of another theorem: Vz. (eq x) — eq (s 2). 
Using this formula as coinduction hypothesis leads to a successful proof, which 
we omit here. From this more general goal, we can derive the original goal by 
instantiating the quantifier with i and eliminating the implication with «;. This 
second derivation is sound with respect to the models, as we show in Theorem 34. 


We encounter yz from Table 1 in a similar situation: To prove pa, we first 
have to prove Va. p x in co-fohh, and then obtain p a as a corollary by appealing 
to Theorem 34. The next example shows that we can cover all cases in Table 1 
by providing a proof in co-hohhgx that involves irregular recursive terms. 


Example 15. Recall the clause Vx y.from (s x) y — from «x (scons x y) 
that we named Kfrom in the introduction. Proving dy.from 0 y is again not 
possible directly. Instead, we can use the term sfr = fix f. Ax.scons x (f (s x)) 
from Example 6 and prove Vx.from x (sf x) coinductively, as shown in Fig. 7. 
This formula gives a coinduction hypothesis of sufficient generality. Note that 
the correct coinduction hypothesis now requires the fixed point definition of an 
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infinite stream of successive numbers and universal quantification in the goal. 
Hence the need for the richer language of co-hohhg,. From this more general goal 
we can derive our initial goal 4 y.from 0 y by instantiating y with sp 0. 


INITIAL 
from (s c) (Sfr 


gE; Pig =, from (s c) (sr (s ©) 
c, X; P; £ from (s c) (str (s c)) 
c, X; P; p => from (s c) (Sf (s c)) 
a 


VL 


DECIDE 


INITIAL 
ie) from c (sf c) a 


from (sc) (sfr (s c))>from c (scons c (sf, 


from c (scons c (Sfr 


c, X; P; p 


CO? fom c (Str c) W 
c, X; P; #2 from c (se c) 
c, X; P; p => (from c (st c)} 

X; P; p => (Vz. from z (sg 2)) 

X; P % Yr. from z (sfr x) 


c, 3 Psp 


VL (2 times) 


DECIDE() 
VR() 
CO-FIX 


Fig. 7. The co-hohhgx proof for p = Vz. from x (sp x). Note that the last step of the 
leftmost branch involves from c (scons c (sf (s c))) = from c (sp c). 


There are examples of coinductive proofs that require a fixed point definition 
of an infinite stream, but do not require the syntax of higher-order terms or 
hereditary Harrop formulae. Such proofs can be performed in the co-fohcg,, logic. 
A good example is a proof that the stream of zeros satisfies the Horn clause 
theory defining the predicate stream in the introduction. The goal (stream so), 
with so = fix z.scons 0 x can be proven directly by coinduction. Similarly, one 
can type self-application with the infinite type a = fixt.t — b for some given 
type b. The proof for typed [zx : a] (app x x) b is then in co-fohcg,. Finally, the 
clause 3 is also in this group. More generally, circular unifiers obtained from 
CoLP’s [41] loop detection yield immediately guarded fixed point terms, and 
thus CoLP corresponds to coinductive proofs in the logic co-fohcg,. A general 
discussion of Horn clause theories that describe infinite objects was given in [48], 
where the above logic programs were identified as being productive. 


4 Coinductive Uniform Proofs and Intuitionistic Logic 


In the last section, we introduced the framework of coinductive uniform proofs, 
which gives an operational account to proofs for coinductively interpreted logic 
programs. Having this framework at hand, we need to position it in the existing 
ecosystem of logical systems. The goal of this section is to prove that coinductive 
uniform proofs are in fact constructive. We show this by first introducing an 
extension of intuitionistic first-order logic that allows us to deal with recursive 
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THA wea . D|AF? g=¢' rita 
P ee ee ae 
Tiare (Proj) Flare (Conv) FlArr | ) 
r\At | Ar Pl AFA i € {1,2 
| p | Y (A-I) | gi A 92 ie{ + (AE) 
T\|ArgpaAYy r| AF gy: 
P\Aty Tig j#i P| A,git T | A,gak 
| AFy p; j#t (viel) [Apik y [A pF Y (v-E) 
r| AF giv gs I| Api Vp Fy 
T| A, pH TrT|AFy> ri At 
CIAR (0) |AFg>y | Ary (>-E) 
T|AFy>y rl Atty 
T,z:T| AF r P| AbVs«:r. M :7 € A(T 
D|AFVa: 7.9 | At y[M/zx] 
M:reEAG(L) F\|Ate[M Titw T,z:Tr| 4H ae 
reds) T| yg [M/z] GD y T,x:r|ApFy rg (3-E) 
P| AraAG>7..p I| A, 3x: T. FY% 


Fig. 8. Intuitionistic rules for standard connectives 


proofs for coinductive predicates. Afterwards, we show that coinductive uniform 
proofs are sound relative to this logic by means of a proof tree translation. The 
model-theoretic soundness proofs for both logics will be provided in Sect. 5. 

We begin by introducing an extension of intuitionistic first-order logic with 
the so-called later modality, written ». This modality is the essential ingredient 
that allows us to equip proofs with a controlled form of recursion. The later 
modality stems originally from provability logic, which characterises transitive, 
well-founded Kripke frames [30,72], and thus allows one to carry out induction 
without an explicit induction scheme [16]. Later, the later modality was picked up 
by the type-theoretic community to control recursion in coinductive program- 
ming [8,9,21,56,58], mostly with the intent to replace syntactic guardedness 
checks for coinductive definitions by type-based checks of well-definedness. 

Formally, the logic iFOLy, is given by the following definition. 


Definition 16. The formulae of iFOL,» are given by Definition 3 and the rule: 


rike 
lk ey 


Conversion extends to these formulae in the obvious way. Let y be a formula and 
A a sequence of formulae in iFOL,. We say ¢ is provable in context I under 
the assumptions A in iFOLy,, if I | At ọ holds. The provability relation + is 
thereby given inductively by the rules in Figs. 8 and 9. 


| Ate 
TC | Arey 


rl Arey y) 
| Arrypopy 


C|A,prpty 
r\|Ako 


(Mon) (Löb) 


(Next) 


Fig. 9. Rules for the later modality 
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The rules in Fig. 8 are the usual rules for intuitionistic first-order logic and 
should come at no surprise. More interesting are the rules in Fig. 9, where the rule 
(Löb) introduces recursion into the proof system. Furthermore, the rule (Mon) 
allows us to to distribute the later modality over implication, and consequently 
over conjunction and universal quantification. This is essential in the translation 
in Theorem 18 below. Finally, the rule (Next) gives us the possibility to proceed 
without any recursion, if necessary. 

Note that so far it is not possible to use the assumption » y introduced in 
the (L6b)-rule. The idea is that the formulae of a logic program provide us the 
obligations that we have to prove, possibly by recursion, in order to prove a 
coinductive predicate. This is cast in the following definition. 


Definition 17. Given an H9-formula y of the shape YT. (A1 A-+-A An) > Y, 
we define its guarding P to be VZ.(» A1 A+: -A An) > Y. For a logic program 
P, we define its guarding P by guarding each formula in P. 


The translation given in Definition 17 of a logic program into formulae 
that admit recursion corresponds unfolding a coinductive predicate, cf. [14]. We 
show now how to transform a coinductive uniform proof tree into a proof tree 
in iFOL,, such that the recursion and guarding mechanisms in both logics 
match up. 


Theorem 18. If P is a logic program over a first-order signature X and the 
sequent X; P % y is provable in co-hohhgx, then P+ » is provable in iFOL,. 


To prove this theorem, one uses that each coinductive uniform proof tree 
starts with an initial tree that has an application of the CO-FIx-rule at the 
root and that eliminates the guard by using the rules in Fig.5. At the leaves 
of this tree, one finds proof trees that proceed only by means of the rules in 
Fig. 4. The initial tree is then translated into a proof tree in iFOL, that starts 
with an application of the (L6b)-rule, which corresponds to the CO-FIx-rule, and 
that simultaneously transforms the coinduction hypothesis and applies introduc- 
tion rules for conjunctions etc. This ensures that we can match the coinduction 
hypothesis with the guarded formulae of the program P. 

The results of this section show that it is irrelevant whether the guarding 
modality is used on the right (CUP-style) or on the left (iFOL,-style), as the 
former can be translated into the latter. However, CUP uses the guarding on the 
right to preserve proof uniformity, whereas iFOLy, extends a general sequent 
calculus. Thus, to obtain the reverse translation, we would have to have an 
admissible cut rule in CUP. The main ingredient to such a cut rule is the ability to 
prove several coinductive statements simultaneously. This is possible in CUP by 
proving the conjunction of these statements. Unfortunately, we cannot eliminate 
such a conjunction into one of its components, since this would require non- 
deterministic guessing in the proof construction, which in turn breaks uniformity. 
Thus, we leave a solution of this problem for future work. 
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5 Herbrand Models and Soundness 


In Sect. 4 we showed that coinductive uniform proofs are sound relative to the 
intuitionistic logic iFOL,. This gives us a handle on the constructive nature of 
coinductive uniform proofs. Since iFOLy, is a non-standard logic, we still need 
to provide semantics for that logic. We do this by interpreting in Sect.5.4 the 
formulae of iFOLy, over the well-known (complete) Herbrand models and prove 
the soundness of the accompanying proof system with respect to these mod- 
els. Although we obtain soundness of coinductive uniform proofs over Herbrand 
models from this, this proof is indirect and does not give a lot of information 
about the models captured by the different calculi co-fohc etc. For this reason, 
we will give in Sect. 5.3 a direct soundness proof for coinductive uniform proofs. 
We also obtain coinduction invariants from this proof for each of the calculi, 
which allows us to describe their proof strength. 


5.1 Coinductive Herbrand Models and Semantics of Terms 


Before we come to the soundness proofs, we introduce in this section (complete) 
Herbrand models by using the terminology of final coalgebras. We then utilise 
this description to give operational and denotational semantics to guarded terms. 
These semantics show that guarded terms allow the description and computation 
of potentially infinite trees. 

The coalgebraic approach has been proven very successful both in logic and 
programming [1,75,76]. We will only require very little category theoretical 
vocabulary and assume that the reader is familiar with the category Set of 
sets and functions, and functors, see for example [12,25,50]. The terminology of 
algebras and coalgebras [4,47,64,65] is given by the following definition. 


Definition 19. A coalgebra for a functor F: Set — Set isa map c: X > FX. 
Given coalgebras d: Y — FY and c: X — FX, we say that a map h: Y — X 
is a homomorphism d — c if Fho d = co h. We call a coalgebra c: X — FX 
final, if for every coalgebra d there is a unique homomorphism h: d — c. We will 
refer to h as the coinductive extension of d. 


The idea of (complete) Herbrand models is that a set of Horn clauses deter- 
mines for each predicate symbol a set of potentially infinite terms. Such terms 
are (potentially infinite) trees, whose nodes are labelled by function symbols and 
whose branching is given by the arity of these function symbols. To be able to 
deal with open terms, we will allow such trees to have leaves labelled by variables. 
Such trees are a final coalgebra for a functor determined by the signature. 


Definition 20. Let X be first-order signature. The extension of a first-order 
signature X is a (polynomial) functor [38] [X] : Set — Set given by 
BIW = yen XP, 


where ar: X — N is defined in Sect.2 and X” is the n-fold product of X. We 
define for a set V a functor |X] +V : Set — Set by ([2]+V)(X) = [L](X)+V, 
where + is the coproduct (disjoint union) in Set. 
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To make sense of the following definition, we note that we can view IJ as a 
signature and we thus obtain its extension [JZ]. Moreover, we note that the final 
coalgebra of |X] + V exists because |X] is a polynomial functor. 


Definition 21. Let X be a first-order signature. The coterms over X are the 
final coalgebra rooty: °(V) —> [L](L~(V)) + V. For brevity, we denote the 
coterms with no variables, i.e. X%(Ø), by root: 2° — [X](L), and call it the 
(complete) Herbrand universe and its elements ground coterms. Finally, we let 
the (complete) Herbrand base B® be the set [H](X%®). 


The construction 1'°(V) gives rise to a functor X% : Set — Set, called 
the free completely iterative monad [5]. If there is no ambiguity, we will drop the 
injections «; when describing elements of °°(V). Note that 3°°(V) is final with 
property that for every s € Y°(V) either there are f € Nand T € (Y~(V))af) 
with rooty(s) = f(T), or there is x € V with rooty(s) = z. Finality allows us 
to specify unique maps into 1°(V) by giving a coalgebra X — [X](X) +V. In 
particular, one can define for each 0: V — X® the substitution t[0] of variables 
in the coterm t by 0 as the coinductive extension of the following coalgebra. 

[id rooto6] 
—— 


Ee (Vv) SY, [EY(E(V)) +V [212"(V)) 


Now that we have set up the basic terminology of coalgebras, we can give 
semantics to guarded terms from Definition 5. The idea is that guarded terms 
guarantee that we can always compute with them so far that we find a function 
symbol in head position, see Lemma 8. This function symbol determines then 
the label and branching of a node in the tree generated by a guarded term. If 
the computation reaches a constant or a variable, then we stop creating the tree 
at the present branch. This idea is captured by the following lemma. 


Lemma 22. There is a map [—]1: ASHT) — Y~(L) that is unique with 

1. if M =N, then [M]: = [N], and 

2. for all M, if M —» f N then rootr([M]1) = f([N]i), and if M —» z then 
rootr([M]) = z. 

Proof (sketch). By Lemma 8, we can define a coalgebra on the quotient of 

guarded terms by convertibility c: AST) = > Py] (484) +T with 

c[M] = f[N] if M—» f N and c[M] = x if M—»z. This yields a homo- 


morphism h: AGA (LP) = — (I) and we can define [—],; = ho [—]. The rest 
follows from uniqueness of h. 


5.2 Interpretation of Basic Intuitionistic First-Order Formulae 


In this section, we give an interpretation of the formulae in Definition 3, in 
which we restrict ourselves to guarded terms. This interpretation will be relative 
to models in the complete Herbrand universe. Since we later extend these models 
to Kripke models to be able to handle the later modality, we formulate these 
models already now in the language of fibrations [17,46]. 
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Definition 23. Let p: E — B be a functor. Given an object J € B, the fibre 
E; above I is the category of objects A € E with p(A) = I and morphisms 
f: A — B with p(f) = idz. The functor p is a (split) fibration if for every 
morphism u: I > J in B there is functor u*: Ey — Ez, such that id; = Idg, 
and (vo u)* = u* o v*. We call u* the reindexing along u. 


To give an interpretation of formulae, consider the following category Pred. 


Pred = oe : (X, P) with X € Set and PC X 

morphisms : f : (X, P) — (Y,Q) isa map f: X — Y with f(P) CQ 
The functor P: Pred — Set with P(X, P) = X and P(f) = f is a split fibration, 
see [46], where the reindexing functor for f: X — Y is given by taking preimages: 
f*(Q) = f-'(Q). Note that each fibre Predx is isomorphic to the complete 
lattice of predicates over X ordered by set inclusion. Thus, we refer to this 
fibration as the predicate fibration. 

Let us now expose the logical structure of the predicate fibration. This will 
allow us to conveniently interpret first-order formulae over this fibration, but it 
comes at the cost of having to introduce a good amount of category theoretical 
language. However, doing so will pay off in Sect.5.4, where we will construct 
another fibration out of the predicate fibration. We can then use category theo- 
retical results to show that this new fibration admits the same logical structure 
and allows the interpretation of the later modality. 

The first notion we need is that of fibred products, coproducts and exponents, 
which will allow us to interpret conjunction, disjunction and implication. 


Definition 24. A fibration p: E > B has fibred finite products (1, x), if each 
fibre E; has finite products (17, Xz) and these are preserved by reindexing: for 
all f: I > J, we have f*(1;) =1; and f*(A x, B) = f*(A) xr f*(B). Fibred 
finite coproducts and exponents are defined analogously. 


The fibration P is a so-called first-order fibration, which allows us to interpret 
first-order logic, see [46, Def. 4.2.1]. 


Definition 25. A fibration p: E — B is a first-order fibration if? 


— B has finite products and the fibres of p are preorders; 

— p has fibred finite products (T, A) and coproducts (L, V) that distribute; 

— p has fibred exponents —; and 

~ p has existential and universal quantifiers 47,7 4 777 4 V 1,7 for all projections 
Trj: Tx Jol. 


A first-order -fibration is a first-order fibration with Cartesian closed base B. 


2 Technically, the quantifiers should also fulfil the Beck-Chevalley and Frobenius con- 
ditions, and the fibration should admit equality. Since these are fulfilled in all our 
models and we do not need equality, we will not discuss them here. 
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The fibration P: Pred — Set is a first-order -fibration, as all its fibres are 
posets and Set is Cartesian closed; P has fibred finite products (T,M), given by 
Tx = X and intersection; fibred distributive coproducts (Ø, U); fibred exponents 
>, given by (P > Q) = {f | if f € P, then T € Q}; and universal and 
existential quantifiers given for P € Predx,y by 


Vx yP={xe xX |WyeY (a,y)eP} 3xyP= {x€ X | dye Y.(a,y) € P}. 


The purpose of first-order fibrations is to capture the essentials of first-order 
logic, while the \-part takes care of higher-order features of the term language. 
In the following, we interpret types, contexts, guarded terms and formulae in 
the fibration P: Pred — Set: We define for types 7 and context I" sets [7] and 
[I]; for guarded terms M with rH M : 7 we define a map [M]: [I] > [7] in 
Set; and for a formula I’ IF y we give a predicate |y] € Predjrj. 

The semantics of types and contexts are given inductively in the Cartesian 
closed category Set, where the base type ¢ is interpreted as coterms, as follows. 


[L] = »° [9] =1 
[7 > o] = [o]! [2:7] = [0] x f7] 
We note that a coterm t € Y®(V) can be seen as a map (X®)Y — VY by 
applying a substitution in (X°%°)V to t: ¢ > ¢[o}. In particular, the semantics of a 
guarded first-order term M € ASHT ) is equivalently a map [|M]: [[] > X”. 


We can now extend this map inductively to [M]: [IT] — [7] for all guarded 
terms M € AG(I) with TE M : T by 


[M] (1) (4) =|M Zhi([Zr t]) Fg M : T with ar(T) = It] = | =| 
[li (4) =e t 
[x](y) = y(x) 
LM N](y) = MIM (INI) 
[Ax. MJO) = IMJ Ole = t]) 


Lemma 26. The mapping |—] is a well-defined function from guarded terms to 
functions, such that [+ M:7 implies [M]: [I] > [r]. 


Since P: Pred — Set is a first-order fibration, we can interpret inductively 
all logical connectives of the formulae from Definition 3 in this fibration. The only 
case that is missing is the base case of predicate symbols. Their interpretation 
will be given over a Herbrand model that is constructed as the largest fixed point 
of an operator over all predicate interpretations in the Herbrand base. Both the 
operator and the fixed point are the subjects of the following definition. 


Definition 27. We let the set of interpretations T be the powerset P(B®) of 
the complete Herbrand base. For I € Z and p € II, we denote by I le the 
interpretation of p in I (the fibre of I above p) 


I, ={€ €(2%)* | p(#) € T}. 
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Given a set P of H9-formulae, we define a monotone map p: T —> T by 
Pp(Z) = {[v]il9] | VË. Agar Ye > Y) E PO: [Z| > X9, Wk. [pe]. [6] € T}, 


where [—]1 [0] is the extension of semantics and substitution from coterms to the 
Herbrand base by functoriality of [JJ]. The (complete) Herbrand model M p of 
P is the largest fixed point of Pp, which exists because Z is a complete lattice. 


Given a formula y with I’ IF y that contains only guarded terms, we define 
the semantics of y in Pred from an interpretation J € Z inductively as follows. 


[r 1- pM, = (TM) 0p) 

[I IF T] = Tir 
[r i- p 0y]; = [E IF y] OF IF yy E {^ V, >} 
[LIF Qz : T. p]; = Qirga Le: 7 IF el, Q € {v,3} 


Lemma 28. The mapping |—], is a well-defined function from formulae to pred- 
icates, such that I IF- p implies |y]; © [L] or, equivalently, [y]; € Predyry. 


This concludes the semantics of types, terms and formulae. We now turn to 
show that coinductive uniform proofs are sound for this interpretation. 


5.3 Soundness of Coinductive Uniform Proofs for Herbrand Models 


In this section, we give a direct proof of soundness for the coinductive uniform 
proof system from Sect. 3. Later, we will obtain another soundness result by 
combining the proof translation from Theorem 18 with the soundness of iFOL» 
(Theorems 39 and 42). The purpose of giving a direct soundness proof for uniform 
proofs is that it allows the extraction of a coinduction invariant, see Lemma 32. 
The main idea is as follows. Given a formula y and a uniform proof m for 
X; P % vy, we construct an interpretation I € Z that validates y, i.e. [y]; = T, 
and that is contained in the complete Herbrand model Mp. Combining these 
two facts, we obtain that [p] 1p = T, and thus the soundness of uniform proofs. 
To show that the constructed interpretation J is contained in Mp, we use 
the usual coinduction proof principle, as it is given in the following definition. 


Definition 29. An invariant for K € T is a set I € Z, such that K C I and I 
is a ®p-invariant, that is, I C @p(I). If K has an invariant, then K C Mp. 


Thus, our goal is now to construct an interpretation together with an invari- 
ant. This invariant will essentially collect and iterate all the substitutions that 
appear in a proof. For this we need the ability to compose substitutions of 
coterms, which we derive from the monad [5] (¥®, n, u) with y: YPX? > X, 


Definition 30. A (Kleisli-)substitution 0 from V to W, written 0: V > W, is 
map V — X(W). Composition of 0: V -W and 6: U -> V is given by 


bos =U & zev) ŽO, s5% (wW) 2 5°(W). 
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The notions in the following definition will allow us to easily organise and 
iterate the substitutions that occur in a uniform proof. 


Definition 31. Let S be a set with S = {1,...,n} for some n € N. We call 
the set S* of lists over S the set of substitution identifiers. Suppose that we 
have substitutions 69: V -= Ø and 0,: V -> V for each k € S. Then we can 
define a map @: S* — (X°)", which turns each substitution identifier into a 
substitution, by iteration from the right: 


O(e) = A and O(w : k) = O(w) © Ox 


After introducing these notations, we can give the outline of the soundness 
proof for uniform proofs relative to the complete Herbrand model. Given an 
H9-formula YË. y, we note that a uniform proof r for X; P P Vz. starts with 


Cu P; A => tlc / T) C: y 
X; P;YT. p => (YT. p) 
X; PHY. Y 


RỌ 


CO-FIX 


where the eigenvariables in @ are all distinct. Let X° be the signature © : 1, X 
and C the set of variables in Z. Suppose the following is a valid subtree of 7. 


ISPA Ni A 
ye. P.A YT. pea 
Xs; P; A ==> A 


VE 


DECIDE 


This proof tree gives rise to a substitution 6: C -e C by d(c) = [Ne], which we 
call an agent of 7. We let D C At? be the set of atoms that are proven in 7: 


D = {A | X°; P; A => (A) or X°; P; A => A appears in 7} 


From the agents and atoms in 7 we extract an invariant for the goal formula. 


Lemma 32. Suppose that p is an H9-formula of the form YT. A1 A++- ^ An > 
Ao and that there is a proof x for X; P % wy. Let D be the proven atoms in n and 
00,..-,05 be the agents of n. Define AF = A;,[C/Z] and suppose further that I, 
is an invariant for {A¢[O(e)] |1 < k <n}. If we put 


h= |] Dle(w)| 
wES* 
then Tı U Ig is an invariant for A§[O(e)]. 
Once we have Lemma 32 the following soundness theorem is easily proven. 
Theorem 33. If p is an H9-formula and X; P % y, then [y] yy, =T. 


Finally, we show that extending logic programs with coinductively proven 
lemmas is sound. This follows easily by coinduction. 
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Theorem 34. Let y be an H9-formula of the shape YT. yı — we, such that, 
for all substitutions 0 if [yı] [0] E Mey, then [yi] [0] E€ Mp. Then X; P } p 
implies M pug} = Mp, that is, PU {ip} is a conservative extension of P with 
respect to the Herbrand model. 


As a corollary we obtain that, if there is a proof for X; P % y, then a proof 
for X; P, p % wv is sound with respect to Mp. Indeed, by Theorem 34 we have 
that Mp = M puy and by Theorem 33 that X; P, p % wy is sound with respect 
to M pugo}. Thus, the proof of X; P, p % w is also sound with respect to Mp. 
We use this property implicitly in our running examples, and refer the reader 
to [15,49] for proofs, further examples and discussion. 


5.4 Soundness of iFOL, over Herbrand Models 


In this section, we demonstrate how the logic iFOLy, can be interpreted over 
Herbrand models. Recall that we obtained a fixed point model from the mono- 
tone map ®p on interpretations. In what follows, it is crucial that we construct 
the greatest fixed point of p by iteration, c.f. [6,32,77|: Let Ord be the class 
of all ordinals equipped with their (well-founded) order. We denote by Ord®°? 
the class of ordinals with their reversed order and define a monotone function 
p: Ord’? — T, where we write the argument ordinal in the subscript, by 


(Be), = Mp, Pr (Pra). 


Note that this definition is well-defined because < is well-founded and because 
Pp is monotone, see [14]. Since Z is a complete lattice, there is an ordinal a such 
that Ppa = Sp (Bpa), at which point Ppa is the largest fixed point M p of p. 
In what follows, we will utilise this construction to give semantics to iFOL». 
The fibration P: Pred — Set gives rise to another fibration as follows. We let 
Pred be the category of functors (monotone maps) with fixed predicate domain: 


objects: u: Ord°? — Pred, such that P o u is constant 
Pred = į morphisms: wu — v are natural transformations f: u => v, 
such that Pf: Pou = Po v is the identity 


The fibration P: Pred — Set is defined by evaluation at any ordinal (here 0), 
i.e. by P(u) = P(u(0)) and P(f) = (Pf)o, and reindexing along f: X — Y by 
applying the reindexing of P point-wise, i.e. by f*(u)a = f* (ua). 

Note that there is a (full) embedding K: Pred — Pred that is given by 
K(X, P) = (X, P) with Pa = P. One can show [14] that P is again a first-order 
fibration and that it models the later modality, as in the following theorem. 


Theorem 35. The fibration P is a first-order fibration. If necessary, we denote 
the first-order connectives by T, A ete. to distinguish them from those in Pred. 
Otherwise, we drop the dots. Finite (co)products and quantifiers are given point- 
wise, while for X € Set and u,v € Predx exponents are given by 


(v>ua=[)} 


peal”? > ug). 
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There is a fibred functor »: Pred — Pred with Tome =T given on objects by 


(> uja = (leza ug 


and a natural transformation next: Id > » from the identity functor to ». The 
functor » preserves reindexing, products, exponents and universal quantification: 
>(f#Ëu) = f# (mu), (unv) = > une v, p (u) > (> u)”, (Vnu) = Yn (> u). 
Finally, for all X € Set and u € Predx, there is löb: (» u > u) > u in Predx. 


Using the above theorem, we can extend the interpretation of formulae to 
iFOL, as follows. Let u: Ord°? — T be a descending sequence of interpreta- 
tions. As before, we define the restriction of u to a predicate symbol p € IT by 
(uly) = aly = {t | p(t) € Ua}. The semantics of formulae in iFOL, as 
objects in Pred is given by the following iterative definition. 

= ——>\ # 
[cit pM, = ( Mj) (ul, 
[Pie T], = Tiry 

[Pl yO], =F y], OCF YI, E {A, V, >} 

[L IF Qz : 7.9), = Qirg pg Le: TI la Q € {V, 5} 
[Pilkey y], = >I IF ol, 


Soasi 


The following lemma is the analogue of Lemma 28 for the interpretation of 
formulae without the later modality. 


Lemma 36. The mapping [—],, is a well-defined map from formulae in iFOL, 
to sequences of predicates, such that T |F p implies |y], € Predgry. 


Lemma 37. All rules of iFOL,» are sound with respect to the interpretation 
l-l, of formulae in Pred, that is, if T | At y, then (Avealtl., > lela) S 


In particular, P+ p implies [y], = T. 


The following lemma shows that the guarding of a set of formulae is valid in 
the chain model that they generate. 


Lemma 38. If ọ is an H-formula in P, then [Fla =T. 


Combining this with soundness from Lemma 37, we obtain that provability 
in iFOL, relative to a logic program P is sound for the model of P. 


Theorem 39. For all logic programs P, if T | Pt ~ then lls; =T. 


The final result of this section is to show that the descending chain model, 
which we used to interpret formulae of iFOL,, is sound and complete for the 
fixed point model, which we used to interpret the formulae of coinductive uniform 
proofs. This will be proved in Theorem 42 below. The easiest way to prove this 
result is by establishing a functor Pred — Pred that maps the chain Tp to 
the model Mp, and that preserves and reflects truth of first-order formulae 
(Proposition 41). We will phrase the preservation of truth of first-order formulae 
by a functor by appealing to the following notion of fibrations maps, cf. [46, Def. 
4.3.1]. 
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Definition 40. Let p: E — B and q: D — A be fibrations. A fibration map 
p > q is a pair (F: E — D,G: B — A) of functors, s.t. go F = Gop and F 
preserves Cartesian morphisms: if f: X — Y in E is Cartesian over p( f), then 
F(f) is Cartesian over G(p(f)). (F,G) is a map of first-order (A-)fibrations, if 
p and q are first-order (A-)fibrations, and F and G preserve this structure. 


Let us now construct a first-order -fibration map Pred — Pred. We note 
that since every fibre of the predicate fibration is a complete lattice, for every 
chain u € Predy there exists an ordinal a at which u stabilises. This means 
that there is a limit lim u of u in Predx, which is the largest subset of X, such 
that Va. limu C ua. This allows us to define a map L: Pred — Pred by 


L(X,u) = (X, limu) 
L(f: (X,u) > (Y0)) = F. 


In the following proposition, we show that L gives us the ability to express 
first-order properties of limits equivalently through their approximating chains. 
This, in turn, provides soundness and completeness for the interpretation of the 
logic iFOLy,» over descending chains with respect to the largest Herbrand model. 


Proposition 41. L: Pred — Pred, as defined above, is a map of first-order 
fibrations. Furthermore, L is right-adjoint to the embedding K: Pred — Pred. 
Finally, for each p € IT and u € Predg~, we have L(ul,) = L(u)| 


p' 


We get from Proposition 41 soundness and completeness of Tp for Herbrand 
models. More precisely, if y is a formula of plain first-order logic (»-free), then 
its interpretation in the coinductive Herbrand model is true if and only if its 
interpretation over the chain approximation of the Herbrand model is true. 
Theorem 42. If p is »-free (Definition 3) then lelg = T if and only if 
[e] Mp — T. 

Proof (sketch). First, one shows for all »-free formulae y that L([vls5) = 
llm, by induction on y and using Proposition 41. Using this identity and 
K AL, the result is then obtained from the following adjoint correspondence. 


T=K(T) > ilz in Pred 


T — L([¢lg;) =[y]m, in Pred 


6 Conclusion, Related Work and the Future 


In this paper, we provided a comprehensive theory of resolution in coinductive 
Horn-clause theories and coinductive logic programs. This theory comprises of a 
uniform proof system that features a form of guarded recursion and that provides 
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operational semantics for proofs of coinductive predicates. Further, we showed 
how to translate proofs in this system into proofs for an extension of intuitionistic 
FOL with guarded recursion, and we provided sound semantics for both proof 
systems in terms of coinductive Herbrand models. The Herbrand models and 
semantics were thereby presented in a modern style that utilises coalgebras and 
fibrations to provide a conceptual view on the semantics. 


Related Work. It may be surprising that automated proof search for coinductive 
predicates in first-order logic does not have a coherent and comprehensive theory, 
even after three decades [3,60], despite all the attention that it received as pro- 
gramming [2,29,42,44] and proof [33,35,39,40,45,59,64-67] method. The work 
that comes close to algorithmic proof search is the system CIRC [63], but it can- 
not handle general coinductive predicates and corecursive programming. Induc- 
tive and coinductive data types are also being added to SMT solvers [24,62]. 
However, both CIRC and SMT solving are inherently based on classical logic 
and are therefore not suited to situations where proof objects are relevant, like 
programming, type class inference or (dependent) type theory. Moreover, the 
proposed solutions, just like those in [41,69] can only deal with regular data, 
while our approach also works for irregular data, as we saw in the from-example. 

This paper subsumes Haskell type class inference [37,51] and exposes that 
the inference presented in those papers corresponds to coinductive proofs in 
co-fohc and co-hohh. Given that the proof systems proposed in this paper are 
constructive and that uniform proofs provide proofs (type inhabitants) in normal 
form, we could give a propositions-as-types interpretation to all eight coinductive 
uniform proof systems. This was done for co-fohc and co-hohh in [37], but we 
leave the remaining cube from the introduction for future work. 


Future Work. There are several directions that we wish to pursue in the future. 
First, we know that CUP is incomplete for the presented models, as it is intu- 
itionistic and it lacks an admissible cut rule. The first can be solved by moving 
to Kripke/Beth-models, as done by Clouston and Goré [30] for the propositional 
part of iFOL,. However, the admissible cut rule is more delicate. To obtain 
such a rule one has to be able to prove several propositions simultaneously by 
coinduction, as discussed at the end of Sect. 4. In general, completeness of recur- 
sive proof systems depends largely on the theory they are applied to, see [70] 
and [18]. However, techniques from cyclic proof systems [27,68] may help. We also 
aim to extend our ideas to other situations like higher-order Horn clauses [28, 43] 
and interactive proof assistants [7,10,23,31], typed logic programming, and logic 
programming that mix inductive and coinductive predicates. 


Acknowledgements. We would like to thank Damien Pous and the anonymous 
reviewers for their valuable feedback. 


Coinduction in Uniform 809 


References 


13. 


14. 


15. 


16. 


17. 


18. 


. Abbott, M., Altenkirch, T., Ghani, N.: Containers: constructing strictly positive 


types. TCS 342(1), 3-27 (2005). https://doi.org/10.1016/j.tcs.2005.06.002 

Abel, A., Pientka, B., Thibodeau, D., Setzer, A.: Copatterns: programming infinite 
structures by observations. In: POPL 2013, pp. 27-38 (2013). https: //doi.org/10. 
1145 /2429069.2429075 

Aczel, P.: Non-well-founded sets. Center for the Study of Language and Informa- 
tion, Stanford University (1988) 

Aczel, P.: Algebras and coalgebras. In: Backhouse, R., Crole, R., Gibbons, J. (eds.) 
Algebraic and Coalgebraic Methods in the Mathematics of Program Construction. 
LNCS, vol. 2297, pp. 79-88. Springer, Heidelberg (2002). https://doi.org/10.1007/ 
3-540-47797-7_3 

Aczel, P., Adámek, J., Milius, S., Velebil, J.: Infinite trees and completely iterative 
theories: a coalgebraic view. TCS 300(1-3), 1-45 (2003). https://doi.org/10.1016/ 
S0304-3975(02)00728-4 

Adámek, J.: On final coalgebras of continuous functors. Theor. Comput. Sci. 
294(1/2), 3-29 (2003). https: //doi.org/10.1016/S0304-3975(01)00240-7 

P.L. group on Agda: Agda Documentation. Technical report, Chalmers and 
Gothenburg University (2015). http://wiki-portal.chalmers.se/agda/, version 
2.4.2.5 

Appel, A.W., Melliés, P.A., Richards, C.D., Vouillon, J.: A very modal model of a 
modern, major, general type system. In: POPL, pp. 109-122. ACM (2007). https:// 
doi.org/10.1145/1190216.1190235 

Atkey, R., McBride, C.: Productive coprogramming with guarded recursion. In: 
ICFP, pp. 197-208. ACM (2013). https://doi.org/10.1145/2500365.2500597 


. Baelde, D., et al.: Abella: a system for reasoning about relational specifications. J. 


Formaliz. Reason. 7(2), 1-89 (2014). https: //doi-org/10.6092/issn.1972-5787/4650 


. Barendregt, H., Dekkers, W., Statman, R.: Lambda Calculus with Types. Cam- 


bridge University Press, Cambridge (2013) 


. Barr, M., Wells, C.: Category Theory for Computing Science. Prentice Hall Inter- 


national Series in Computer Science, 2nd edn. Prentice Hall, Upper Saddle River 
(1995). http://www.tac.mta.ca/tac/reprints/articles/22/tr22abs.html 

Basold, H.: Mixed inductive-coinductive reasoning: types, programs and logic. 
Ph.D. thesis, Radboud University Nijmegen (2018). http://hdl-handle.net /2066/ 
190323 

Basold, H.: Breaking the Loop: Recursive Proofs for Coinductive Predicates in 
Fibrations. ArXiv e-prints, February 2018. https://arxiv.org/abs/1802.07143 
Basold, H., Komendantskaya, E., Li, Y.: Coinduction in uniform: foundations for 
corecursive proof search with horn clauses. Extended version of this paper. CoRR 
abs/1811.07644 (2018). http://arxiv.org/abs/1811.07644 

Beklemishev, L.D.: Parameter free induction and provably total com- 
putable functions. TCS 224(1-2), 13-33 (1999). https://doi.org/10.1016/S0304- 
3975(98)00305-3 

Bénabou, J.: Fibered categories and the foundations of naive category theory. J. 
Symb. Logic 50(1), 10-37 (1985). https://doi.org/10.2307/2273784 

Berardi, S., Tatsuta, M.: Classical system of Martin-Lof’s inductive definitions is 
not equivalent to cyclic proof system. In: Esparza, J., Murawski, A.S. (eds.) FoS- 
SaCS 2017. LNCS, vol. 10203, pp. 301-317. Springer, Heidelberg (2017). https:// 
doi.org/10.1007/978-3-662-54458-7_18 


810 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


H. Basold et al. 


Birkedal, L., Møgelberg, R.E.: Intensional type theory with guarded recursive types 
qua fixed points on universes. In: LICS, pp. 213-222. IEEE Computer Society 
(2013). https://doi.org/10.1109/LICS.2013.27 

Birkedal, L., Møgelberg, R.E., Schwinghammer, J., Støvring, K.: First steps in syn- 
thetic guarded domain theory: step-indexing in the topos of trees. In: Proceedings 
of LICS 2011, pp. 55-64. IEEE Computer Society (2011). https://doi.org/10.1109/ 
LICS.2011.16 

Bizjak, A., Grathwohl, H.B., Clouston, R., Mggelberg, R.E., Birkedal, L.: Guarded 
dependent type theory with coinductive types. In: Jacobs, B., Loding, C. (eds.) 
FoSSaCS 2016. LNCS, vol. 9634, pp. 20-35. Springer, Heidelberg (2016). https:// 
doi.org/10.1007/978-3-662-49630-5_2. https: //arxiv.org/abs/1601.01586 

Bjørner, N., Gurfinkel, A., McMillan, K., Rybalchenko, A.: Horn clause solvers for 
program verification. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner, 
B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp. 
24-51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9_2 
Blanchette, J.C., Meier, F., Popescu, A., Traytel, D.: Foundational nonuniform 
(co)datatypes for Higher-Order Logic. In: LICS 2017, pp. 1-12. IEEE Computer 
Society (2017). https: //doi.org/10.1109/LICS.2017.8005071 

Blanchette, J.C., Peltier, N., Robillard, S.: Superposition with datatypes and 
codatatypes. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS 
(LNAI), vol. 10900, pp. 370-387. Springer, Cham (2018). https://doi.org/10.1007/ 
978-3-319-94205-6_25 

Borceux, F.: Handbook of Categorical Algebra. Basic Category Theory, vol. 1. 
Cambridge University Press, Cambridge (2008) 

Bottu, G., Karachalias, G., Schrijvers, T., Oliveira, B.C.D.S., Wadler, P.: Quanti- 
fied class constraints. In: Haskell Symposium, pp. 148-161. ACM (2017). https:// 
doi.org/10.1145/3122955.3122967 

Brotherston, J., Simpson, A.: Sequent calculi for induction and infinite descent. J. 
Log. Comput. 21(6), 1177-1216 (2011). https://doi.org/10.1093/logcom/exq052 
Burn, T.C., Ong, C.L., Ramsay, S.J.: Higher-order constrained horn clauses for ver- 
ification. PACMPL 2(POPL), 11:1—11:28 (2018). https://doi-org/10.1145/3158099 
Capretta, V.: General Recursion via Coinductive Types. Log. Methods Comput. 
Sci. 1(2), July 2005. https://doi.org/10.2168/LMCS-1(2:1)2005 

Clouston, R., Goré, R.: Sequent calculus in the topos of trees. In: Pitts, A. (ed.) 
FoSSaCS 2015. LNCS, vol. 9034, pp. 133-147. Springer, Heidelberg (2015). https:// 
doi.org/10.1007/978-3-662-46678-0_9 

Coquand, T.: Infinite objects in type theory. In: Barendregt, H., Nipkow, T. (eds.) 
TYPES 1993. LNCS, vol. 806, pp. 62-78. Springer, Heidelberg (1994). https: //doi. 
org/10.1007/3-540-58085-9_72 

Cousot, P., Cousot, R.: Constructive versions of Tarski’s fixed point theorems. Pac. 
J. Math. 82(1), 43-57 (1979). http://projecteuclid.org/euclid.pjm/1102785059 
Dax, C., Hofmann, M., Lange, M.: A proof system for the linear time j-calculus. 
In: Arun-Kumar, S., Garg, N. (eds.) FSTTCS 2006. LNCS, vol. 4337, pp. 273-284. 
Springer, Heidelberg (2006). https://doi.org/10.1007/11944836_26 

van Emden, M., Kowalski, R.: The semantics of predicate logic as a programming 
language. J. Assoc. Comput. Mach. 23, 733-742 (1976). https: //doi.org/10.1145/ 
321978.321991 

Endrullis, J., Hansen, H.H., Hendriks, D., Polonsky, A., Silva, A.: A coinductive 
framework for infinitary rewriting and equational reasoning. In: RTA 2015, pp. 
143-159 (2015). https: //doi.org/10.4230/LIPIcs.RTA.2015.143 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


Coinduction in Uniform 811 


Farka, F., Komendantskaya, E., Hammond, K.: Coinductive soundness of corecur- 
sive type class resolution. In: Hermenegildo, M.V., Lopez-Garcia, P. (eds.) LOP- 
STR 2016. LNCS, vol. 10184, pp. 311-327. Springer, Cham (2017). https://doi. 
org/10.1007/978-3-319-63139-4_18 

Fu, P., Komendantskaya, E., Schrijvers, T., Pond, A.: Proof relevant corecursive 
resolution. In: Kiselyov, O., King, A. (eds.) FLOPS 2016. LNCS, vol. 9613, pp. 
126-143. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29604-3_9 
Gambino, N., Kock, J.: Polynomial functors and polynomial monads. Math. 
Proc. Cambridge Phil. Soc. 154(1), 153-192 (2013). https://doi.org/10.1017/ 
$0305004112000394 

Giesl, J., et al.: Analyzing program termination and complexity automatically with 
AProVE. J. Autom. Reason. 58(1), 3-31 (2017). https: //doi.org/10.1007/s10817- 
016-9388-y 

Giménez, E.: Structural recursive definitions in type theory. In: Larsen, K.G., 
Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 397-408. Springer, 
Heidelberg (1998). https://doi.org/10.1007/BFb0055070 

Gupta, G., Bansal, A., Min, R., Simon, L., Mallya, A.: Coinductive logic program- 
ming and its applications. In: Dahl, V., Niemela, I. (eds.) ICLP 2007. LNCS, vol. 
4670, pp. 27-44. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540- 
74610-2_4 

Hagino, T.: A typed lambda calculus with categorical type constructors. In: Pitt, 
D.H., Poigné, A., Rydeheard, D.E. (eds.) Category Theory and Computer Science. 
LNCS, vol. 283, pp. 140-157. Springer, Heidelberg (1987). https://doi.org/10.1007/ 
3-540- 18508-9_24 

Hashimoto, K., Unno, H.: Refinement type inference via horn constraint optimiza- 
tion. In: Blazy, S., Jensen, T. (eds.) SAS 2015. LNCS, vol. 9291, pp. 199-216. 
Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48288-9_12 
Howard, B.T.: Inductive, coinductive, and pointed types. In: Harper, R., Wexelblat, 
R.L. (eds.) Proceedings of ICFP 1996, pp. 102-109. ACM (1996). https: //doi.org/ 
10.1145 /232627.232640 

Hur, C.K., Neis, G., Dreyer, D., Vafeiadis, V.: The power of parameterization 
in coinductive proof. In: Proceedings of POPL 2013, pp. 193-206. ACM (2013). 
https: //doi.org/10.1145/2429069.2429093 

Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Founda- 
tions of Mathematics, vol. 141. North Holland, Amsterdam (1999) 

Jacobs, B.: Introduction to Coalgebra: Towards Mathematics of States and Obser- 
vation. Cambridge Tracts in Theoretical Computer Science, vol. 59. Cambridge 
University Press, Cambridge (2016). https://doi.org/10.1017/CBO9781316823187. 
http://www.cs.ru.nl/B.Jacobs/CLG/JacobsCoalgebralntro.pdf 

Komendantskaya, E., Li, Y.: Productive corecursion in logic programming. J. 
TPLP (ICLP 2017 post-proc.) 17(5-6), 906-923 (2017). https: //doi.org/10.1017/ 
$147106841700028X 

Komendantskaya, E., Li, Y.: Towards coinductive theory exploration in horn clause 
logic: Position paper. In: Kahsai, T., Vidal, G. (eds.) Proceedings 5th Workshop on 
Horn Clauses for Verification and Synthesis, HCVS 2018, Oxford, UK, 13th July 
2018, vol. 278, pp. 27-33 (2018). https: //doi.org/10.4204/EPTCS.278.5 

Lambek, J., Scott, P.J.: Introduction to Higher-Order Categorical Logic. Cam- 
bridge University Press, Cambridge (1988) 

Lammel, R., Peyton Jones, S.L.: Scrap your boilerplate with class: extensible 
generic functions. In: ICFP 2005, pp. 204-215. ACM (2005). https: //doi.org/10. 
1145/1086365.1086391 


812 H. Basold et al. 


52 


53. 


54. 


55. 


56. 


57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


. Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Heidelberg 
1987). https://doi.org/10.1007/978-3-642-83189-8 

Miller, D., Nadathur, G.: Programming with Higher-order logic. Cambridge Uni- 
versity Press, Cambridge (2012) 

Miller, D., Nadathur, G., Pfenning, F., Scedrov, A.: Uniform proofs as a foundation 
for logic programming. Ann. Pure Appl. Logic 51(1-2), 125-157 (1991). https:// 
doi.org/10.1016/0168-0072(91)90068- W 

Milner, R.: A theory of type polymorphism in programming. J. Comput. Syst. Sci. 
17(3), 348-375 (1978). https://doi.org/10.1016/0022-0000(78)90014-4 
Møgelberg, R.E.: A type theory for productive coprogramming via guarded 
recursion. In: CSL-LICS, pp. 71:1-71:10. ACM (2014). https://doi-org/10.1145/ 
2603088.2603132 

Nadathur, G., Mitchell, D.J.: System description: Teyjus—a compiler and abstract 
machine based implementation of AProlog. CADE-16. LNCS (LNAI), vol. 1632, pp. 
287-291. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48660-7_25 
Nakano, H.: A modality for recursion. In: LICS, pp. 255-266. IEEE Computer 
Society (2000). https: //doi.org/10.1109/LICS.2000.855774 

Niwinski, D., Walukiewicz, I.: Games for the -Calculus. TCS 163(1&2), 99-116 
1996). https://doi.org/10.1016/0304-3975(95)00136-0 

Park, D.: Concurrency and automata on infinite sequences. In: Deussen, P. (ed.) 
GI-TCS 1981. LNCS, vol. 104, pp. 167-183. Springer, Heidelberg (1981). https:// 
doi.org/10.1007/BFb0017309 

Plotkin, G.D.: LCF considered as a programming language. Theor. Comput. Sci. 
5(3), 223-255 (1977). https: //doi.org/10.1016/0304-3975(77)90044-5 

Reynolds, A., Kuncak, V.: Induction for SMT solvers. In: D’Souza, D., Lal, A., 
Larsen, K.G. (eds.) VMCAI 2015. LNCS, vol. 8931, pp. 80-98. Springer, Heidelberg 
(2015). https://doi.org/10.1007/978-3-662-46081-8_5 

Roşu, G., Lucanu, D.: Circular coinduction: a proof theoretical foundation. In: 
Kurz, A., Lenisa, M., Tarlecki, A. (eds.) CALCO 2009. LNCS, vol. 5728, pp. 127- 
144. Springer, Heidelberg (2009). https: //doi.org/10.1007/978-3-642-03741-2_10 
Rutten, J.: Universal coalgebra: a theory of systems. TCS 249(1), 3-80 (2000). 
https: //doi.org/10.1016/S0304-3975(00)00056-6 

Sangiorgi, D.: Introduction to Bisimulation and Coinduction. Cambridge Univer- 
sity Press, New York (2011) 

Santocanale, L.: A calculus of circular proofs and its categorical semantics. In: 
Nielsen, M., Engberg, U. (eds.) FoSSaCS 2002. LNCS, vol. 2303, pp. 357-371. 
Springer, Heidelberg (2002). https: //doi.org/10.1007/3-540-45931-6_25 
Santocanale, L.: u-bicomplete categories and parity games. RAIRO - ITA 36(2), 
195-227 (2002). https: //doi.org/10.1051/ita:2002010 

Shamkanov, D.S.: Circular proofs for the Gédel-L6b provability logic. Math. Notes 
96(3), 575-585 (2014). https: //doi.org/10.1134/S0001434614090326 

Simon, L., Bansal, A., Mallya, A., Gupta, G.: Co-logic programming: extending 
logic programming with coinduction. In: Arge, L., Cachin, C., Jurdzitiski, T., Tar- 
lecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 472-483. Springer, Heidelberg 
(2007). https://doi.org/10.1007/978-3-540-73420-8_42 

Simpson, A.: Cyclic arithmetic is equivalent to Peano arithmetic. In: Esparza, J., 
Murawski, A.S. (eds.) FoSSaCS 2017. LNCS, vol. 10203, pp. 283-300. Springer, 
Heidelberg (2017). https://doi.org/10.1007/978-3-662-54458-7_17 

Smoryniski, C.: Self-Reference and Modal Logic. Universitext. Springer, New York 
(1985). https://doi.org/10.1007/978-1-4613-8601-8 


72. 


73. 


74. 


75. 


76. 


Ths 


Coinduction in Uniform 813 


Solovay, R.M.: Provability interpretations of modal logic. Israel J. Math. 25(3), 
287-304 (1976). https://doi.org/10.1007/BF02757006 

Sulzmann, M., Stuckey, P.J.: HM(X) type inference is CLP(X) solving. J. Funct. 
Program. 18(2), 251-283 (2008). https://doi.org/10.1017/S0956796807006569 
Terese: Term Rewriting Systems. Cambridge University Press, Cambridge (2003) 
Turner, D.A.: Elementary strong functional programming. In: Hartel, P.H., Plas- 
meijer, R. (eds.) FPLE 1995. LNCS, vol. 1022, pp. 1-13. Springer, Heidelberg 
(1995). https: //doi.org/10.1007/3-540-60675-0_35 

van den Berg, B., de Marchi, F.: Non-well-founded trees in categories. Ann. Pure 
Appl. Logic 146(1), 40-59 (2007). https://doi.org/10.1016/j.apal.2006.12.001 
Worrell, J.: On the final sequence of a finitary set functor. Theor. Comput. Sci. 
338(1-3), 184-199 (2005). https: //doi.org/10.1016/j.tcs.2004.12.009 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


Accattoli, Beniamino 410 
Ahman, Danel 30 
Alvarez-Picallo, Mario 525 
Ariola, Zena M. 119 


Balzer, Stephanie 611 
Basold, Henning 783 
Besson, Frédéric 499 
Bi, Xuan 381 

Blazy, Sandrine 499 
Bocchi, Laura 583 
Boutillier, Pierre 176 
Buro, Samuele 293 


Castellan, Simon 322 
Chopra, Nikita 697 
Cristescu, Ioana 176 


D’Souza, Deepak 697 
Dal Lago, Ugo 263 
Dang, Alexandre 499 
Downen, Paul 119 
Dumitrescu, Victor 30 


Eyers-Taylor, Alex 525 


Feret, Jérôme 176 
Fisher, Kathleen 205 
Frumin, Dan 60 
Fuhs, Carsten 752 


Garg, Deepak 469 
Gavazzo, Francesco 263 
Giannarakis, Nick 30 
Giarrusso, Paolo G. 553 
Gilbert, Frederic 440 
Gondelman, Léon 60 
Gordon, Colin S. 88 
Guerrieri, Giulio 410 


Author Index 


Hawblitzel, Chris 30 
Höfner, Peter 668 
Hritcu, Cătălin 30 


Igarashi, Atsushi 353 
Jensen, Thomas 499 


Jourdan, Jacques-Henri 3 
Journault, Matthieu 724 


Komendantskaya, Ekaterina 783 


Kop, Cynthia 752 
Krebbers, Robbert 60 
Kuru, Ismail 88 


Leberle, Maico 410 
Li, Yue 783 


Markl, Michael 668 
Martínez, Guido 30 
Mastroeni, Isabella 293 
McDermott, Dylan 235 
Mével, Glen 3 

Miné, Antoine 724 
Murgia, Maurizio 583 
Mycroft, Alan 235 


Narasimhamurthy, Monal 30 


Oliveira, Bruno C. d. S. 381 
Ong, C.-H. Luke 525 
Orchard, Dominic 147 
Ouadjaout, Abdelraouf 724 


Pai, Rekha 697 

Paquet, Hugo 322 
Paraskevopoulou, Zoe 30 
Patrignani, Marco 469 
Peyton Jones, Michael 525 


816 Author Index 


Peyton Jones, Simon 119 
Pfenning, Frank 611 
Pit-Claudel, Clement 30 
Pottier, Francois 3 
Protzenko, Jonathan 30 


Ramananandro, Tahina 30 
Rastogi, Aseem 30 
Régis-Gianas, Yann 553 


Sakayori, Ken 640 
Schrijvers, Tom 381 
Schuster, Philipp 553 
Sekiyama, Taro 353 
Sullivan, Zachary 119 
Swamy, Nikhil 30 


Toninho, Bernardo 611 
Tsukada, Takeshi 640 


van Glabbeek, Rob 668 
Vasconcelos, Vasco Thudichum 583 
Vesely, Ferdinand 205 


Wang, Meng 147 
Wilke, Pierre 499 


Xia, Li-yao 147 
Xie, Ningning 381 


Yoshida, Nobuko 583 


